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ABSTRACT 


This  thesis  is  aiasd  at  deteraining  tha  worst  case 
asymptotical  tiaa  coaplaxity  behaviour  of  algorithas  for 
relational  operations  that  work  on  extensionally  or 
intensionally  represented  binary  relations.  Those  relational 
operations  coae  froa  a  relational  language  being  designed  at 
Naval  Postgraduate  School.  One  particular  extensional 
representation  technique  and  two  intensional  representation 
techniques  are  proposed.  The  above  analysis  in  turn 
deteraines  the  feasibility  of  iapleaenting  a  subset  of  the 
relational  language  on  conventional  architectures. 
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This  thesis  is  aiaed  at  analysing  the  worst  case 
asymptotical  time  complexity  behaviour  of  the  algorithms 
associated  with  the  relational  operations  that  work  on  the 
extensionally  or  intensionally  represented  binary  relations. 

The  most  obvious  representation  of  a  relation  is  the 
extensional  representation#  in  which  all  the  elements  of  a 
relation  are  explicitly  represented  in  memory.  There  are 
various  kinds  of  extensional  representation  techniques  that 
will  be  explained  in  detail  later. 

When  the  use  of  memory  is  critical  so  that  it  is 
uneconomical  to  represent  relation  explicitly  in  the  memory# 
intensional  representation  techniques  should  be  used.  Here  a 
relation  or  set  is  represented  by  a  formula  or  expression  for 
computing  that  relation  or  set.  Operations  on  the  set  or 
relation  are  implemented  as  formal  operations  on  the 
expression.  Because  the  relations  have  well  defined  algebric 
properties#  this  seems  feasible.  As  we  can  see#  an 
intensional  representation  is  really  just  a  variant  of  a  lazy 
evaluation  mechanism.  [Ref.  1]  We  will  try  to  decide  on  the 
feasibility  of  this  kind  of  mechanism  in  Section  III. 

The  relational  operations  we  mentioned  above  belong  to  a 
relational  language  being  designed  at  Naval  Postgraduate 
School.  [Ref.  2]  In  relational  programming  entire  relations 
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are  Manipulated  rather  than  individual  data.  This  is 
analogous  to  functional  prograaaing  [Ref.  3],  in  which  entire 
functions  are  the  values  aanipulated  by  the  operators. 
Because  the  set  of  all  functions  is  the  subset  of  the  set  of 
all  relations,  relational  prograaaing  subsuaes  functional 
prograaaing.  Hence  anything  that  can  be  done  with  functional 
prograaaing  can  be  done  with  relational  prograaaing.  Thus 
relational  prograaaing  has  aany  of  the  advantages  of 
functional  prograaaing.  Although  relations  are  aore  general 
than  functions,  their  laws  are  often  siapler.  In  addition, 
relational  prograaaing  aore  directly  supports  non-linear  data 
structures  such  as  graphs  ami  digraphs,  than  does  functional 
prograaaing.  In  relational  prograaaing  the  basic  data  values 
are  theaselves  relations!  on  the  other  hand  in  functional 
prograaaing  there  is  a  separate  class  of  objects  used  for 
data  structures. 

The  objective  of  this  research  is  to  deteraine  the 
feasibility  of  iapleaenting  the  relational  language  on 
conventional  architectures  by  doing  the  worst  case 
asyaptotical  tiae  complexity  analysis  of  the  algorithas 
associated  with  the  relational  operations.  In  Section  II  we 
will  focus  on  the  algorithas  that  work  on  the  extensionally 
represented  relations  and  sets.  In  Chapter  III  we  will 
inspect  the  intensional  algorithas  and  define  a  aechanisa  to 
do  the  relational  operations  intensionally.  In  Chapter  IV  we 
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will  focus  on  the  issue  of  defining  pure  intensional  system 
to  see  if  we  can  do  the  relational  operations  without 
representing  any  relation  or  set  extensionally. 

A.  BACKGROUND  INFORMATION 


1*  Theorems 


Definitions 


In  this  section  we  will  provide  information  on 
relations#  and  the  extensional  representation  techniques  for 
the  relations  and  sets.  we  will  state  some  important 
theorems  that  will  be  helpful  in  our  analysis  and  prove  them. 
Because  our  relational  operations  work  on  binary  relations  we 
will  be  focusing  on  the  properties  of  binary  relations.  We 
will  assume  that  the  reader  already  has  some  background  on 
relations. 

We  often  want  to  treat  collections  of  two-ary  tuples 
where  the  components  of  each  tuple  are  the  elements  of  some 
sets.  The  set  of  all  such  two-ary  tuples  is  defined  as 
follows: 

Definition  1:  Let  A  and  B  be  sets.  The  cartesian 
product  of  the  sets  A  and  B  denoted  by  AXB#  is  the  set  of  all 
two-ary  tuples  such  that: 

{  <al#a2>  |  (al  i  A)  A  (a2  c  B)} 

The  following  definition  provides  some  additional 
terminology . 


Definition  2:  Let  R  be  a  binary  relation  over  AXB. 
The  set  A  is  the  domain  or  the  ’Right  Members  Set”  (RIM  set) 
of  R;  B  is  the  codomain  or  the  "Left  Members  Set”  (LEM  set) 
of  R. 

Binary  relations  are  just  another  representation  of 
graphs  so  the  following  definitions  and  theorems  will  be 
useful  in  our  study. 

Definition  3:  A  digraph  is  an  ordered  pair  F*<C,R> 
where  R  is  a  binary  relation  on  the  set  C.  The  set  C  is  said 
to  be  the  set  of  vertices  of  digraph  F  and  the  tuples  of  R 
corresponds  to  arcs  (edges)  of  F. 

Definition  4:  Let  F*<C,R>  be  a  digraph  with  nodes  d 
and  e.  An  undirected  path  6  from  d  to  e  is  a  finite  sequence 
of  nodes  6  *<b0r  bj,,  b2,..*«*bn>  such  that: 

1.  bo  *  d 

2.  bn  -  e 

3.  For  all  b^  such  that  n  >»  i  >■  0,  either  b^Rbi+l 
or  bi+iRbj,. 

If  b^Rb^+i  for  all  b^,  n  >  i  >  0,  then  G  is  a  directed  path 
from  d  to  e.  The  node  d  is  the  initial  node  of  G  and  e  is 
the  terminal  node  of  G.  The  length  of  the  path  G  is  n.  If 
all  the  nodes  of  G  are  distinct  except  the  first  and  last 
then  G  is  a  simple  path.  If  bo  is  the  same  as  bn,  then  G  is 
a  cycle.  If  G  is  both  simple  path  and  a  cycle,  then  G  is  a 
simple  cycle. 


i  fp 


Definition  5:  Let  S  be  a  binary  relation  on  B.  Then 
S  is  reflexive  if  xSx  for  every  x  in  B.  S  is  irreflexive  if 
{not(xSx)}  is  true  for  every  x  in  B.  S  is  symmetric  if  xSy 
implies  ySx  for  every  x,y  e  a.  S  is  antisymmetric  if  xSy  and 
ySx  together  imply  x«y  for  every  x#y  e  B.  S  is  transitive  if 
xSy  and  ySz  together  imply  xSz  for  every  x#y,z  e  B. 

Definition  6:  Let  R  be  a  binary  relation  on  a  set  B. 
The  transitive  (reflexive#  symmetric)  closure  of  R  is  the 
relation  S  such  that: 

1.  S  is  the  super  set  of  R. 

2.  S  is  transitive  (reflexive#  symmetric) . 

3.  For  any  transitive  (reflexive#  symmetric) 
relation  T#  if  T  is  the  super  set  of  the  R  then  T 
is  the  super  set  of  the  relation  S. 

We  can  denote  the  transitive  closure  of  R  by 
trans(R)#  the  reflexive  closure  of  R  by  refl(R)#  the 
symmetric  closure  by  symm(R)#  and  the  transitive#  reflexive 
closure  by  trans-ref 1(R) .  Obtaining  the  closure  of  a  binary 
relation  can  be  easily  understood  in  terms  of  digraphs.  For 
example#  a  digraph  represents  a  reflexive  binary  relation  if 
it  has  loops  on  every  node.  So  given  a  binary  relation 
represented  by  a  digraph  we  can  obtain  the  reflexive  closure 
of  this  relation  by  adding  a  loop  to  every  node  of  the 
digraph  which  does  not  already  have  one.  Let  E  be  the 
equality  relation  on  an  arbitrary  set  X;  that  is# 
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E  *  {<a,a,>|  a  c  x} 

then  by  using  this  relation  we  can  state  a  theorem  as 
follows : 

Theorem  1:  Let  R  be  a  binary  relation  on  a  set  B. 
Then  refl(R)«R  E,  where  E  is  the  equality  relation  on  the 
set  B. 

Proof:  Let  S  »  rUe.  We  show  that  S  satisfies 

Definition  6.  By  construction  S  is  reflexive  and  S  is  the 
super  set  of  R.  Assume  T  is  a  reflexive  relation  on  B  and  T 
is  the  super  set  of  R.  We  have  to  show  T  is  the  super  set  of 
S.  Let's  take  an  arbitrary  tuple,  say  <s,t>,  which  is  the 
member  of  R.  If  s*t,  then  <s,t>eT  because  the  T  is 
reflexive.  If  <8,t>cR,  then  <s,t>cT  because  the  T  is  the 
super  set  of  R  by  assumption.  So  if  <s,t>cS,  then  <s,t,>eT. 
So  as  a  result,  the  definition  6  is  satisfied  and  S*refl(R). 

Definition  7:  Let  R  be  a  binary  relation  on  a  set  A 
and  let  n  be  a  natural  number.  Then,  the  nth  power  of  R, 
denoted  Rn,  is  defined  as  follows: 

1.  R°  is  the  relation  of  equality  on  the  set  A: 

R°  ■  {<x,x>|xSA} 

2.  Rn+1  «  RnR 

Theorem  2:  Let  R  be  a  binary  relation  on  the  set  B. 

Then 

trans  (R)  »  U  Ri  *  R  IJ  R2  (J  R3  I  I  r4 . 

i-1  w 

Proof:  The  proof  can  be  done  in  two  parts. 


L.  VjJ  R*  is  a  subset  of  trans  (R).  We  will  first 

show  by  induction  that  R*1  is  a  subset  of  trans(R) 
for  every  "n"  greater  than  zero. 

a.  (Basis)  From  definition  6,  part  2,  it  is 
immediately  apparent  that  R  is  a  subset  of 
the  trans (R) . 

b.  (Induction)  Assume  Rn  is  a  subset  of 
trans (R) ,  and  n  is  greater  than  or  equal  to 
1.  Let  <s,t>  be  the  member  of  Rn+1.  Since 

Rn+1*RnR,  there  exists  some  u  such  that  u  is  the 

member  of  set  Br  <s,u>.  c  R  and  <u,t>  is  the 

member  of  Rn.  By  the  induction  hypothesis  and 

the  basis  step*  <s,u>  is  the  member  of  trans (R) 

and  <u,t>  is  the  member  of  trans(R).  Since 

trans (R)  is  guaranteed  to  be  transitive  it 

follows  that  <s,t>  is  the  member  of  trans (R)  * 

thus  establishing  that  Rn+1  is  a  subset  of 

trans (R).  Since  R  is  a  subset  of  trans (R)  for 

all  n>*l,  we  conclude  that: 

U  R*  CH  trans  (R) 
i-1 

2.  trans(R)  is  a  subset  of  the  IJ  R*.  We  will  first 

i-1 


show  that: 


is  transitive.  Let  <a,t>  and  <t,u>  be  arbitrary 

aeabers  of  II  R*#  then  for  some  integers  W>-1 
wi-l 


and  Y>-1,  <s,t>  is  the  aeaber  of  Rw  and  <t#u>  is 
the  aeaber  of  R*.  Then  <s#u>  is  the  aeaber  of 


R*R*#  and  because  we  know  that  R*R*-RW+Y,  <s#u> 


is  the  aeaber  of 


l  R^  and  therefore 
i-1 


is  transitive.  Because  trans(R)  is  contained  in 
every  transitive  relation  which  contains  R#  it 


follows  that  trans(R)is  a  subset  of 


i-1 


Proa  part  1  and  2  we  can  write: 

trans  (R)  -  R  U  R2  \j  R3  (J  R* . 

by  using  the  basic  set  properties. 

Theorea  3:  Let  R  be  a  binary  relation  on  a  set  B 
which  is  of  cardinality  n.  Then 

trans  (R)  *  l  l  R* 

^i-l 


Proof:  We  will  show  that  id  is  a  subset  of  l)  R* 

i-1 

for  all  j>0.  Assuae  <s#t>  is  the  aeaber  of  r3#  then  there  is 
a  directed  path  of  length  j  froa  s  to  t  in  the  digraph  <B,R>, 


and  by  deleting  cycles  froa  this  path  we  can  obtain  a  simple 
directed  path  going  froa  s  to  t.  Because/  in  a  graph  with  n 
nodes#  the  longest  simple  path  is  liaited  to  length  n,  it 


follows  that  <s*tt>  is  ths  member  of  R^  foe  some  n>-i>0 


This  r3  is  ths  subsst  of  1  )  RA  for  j>0. 

w  i-1 

Definition  8:  if  R  is  a  binary  relation  on  a  set  B, 
then  san:R  denotes  trans(R)*  the  transitive  closure  of  R*  and 
fansR  denotes  transitive  reflexive  closure  of  R  (trans- 
refl(R) ) • 

Mow  we  will  go  into  the  theoreas  that  will  be  helpful 
in  finding  the  asymptotical  time  complexity  behaviour  of  the 
algorithms. 

Definition  9s  Let  f  and  g  be  functions  and  let  their 
domains  be  the  set  of  natural  numbers  and  the  codomains  be 


the  set  of  real  numbers ,  then  g  asymptotically  dominates  f, 
or  f  is  asymptotically  dominated  by  g,  if  there  exists  s>-0 
and  t>»0  such  that  |f  (n)  |»<t|g(n)  |  for  all  ii>*s. 

Example:  Let  f(n)«n/2  and  g(n)»n3  for  all  natural 
numbers  *n"»  the  above  definition  is  satisfied  by  setting  the 
8*0  and  t*l»  hence  g  asymptotically  dominates  f. 

Definition  10:  The  set  of  all  functions  which  are 


asymptotically  dominated  by  a  given  function  h  is  denoted  by 
0(h) ,  and  is  read  as  "big-oh  of  h”,  or  "order  h".  If  a  given 
function  say  j  is  the  element  of  0(h) ,  then  j  is  said  to  be 
0(h). 
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Theorea  4:  Let  the  functions  t,  g ,  r  be  the  kind  of 
functions  which  asp  the  natural  nuabers  to  the  real  nuabers. 


Then: 

1.  f  is  0(f). 

2.  if  f  is  0(9)  then  c*f  is  0(g)  for  any  real  nuaber 
c. 

3.  if  f  and  h  are  both  0(g) ,  then  their  sua  (f+h), 
(where  (f  +h)  (n)  -f  (n)  +h  (n) .)  is  0(g). 

Proof: 

1.  To  show  f  asaptotically  doainates  t,  we  choose 
s*0  and  t*l  and  apply  definition  9.  Thus  by 
definition  9  f  doainates  f. 

2.  If  f  is  asyaptotically  doainated  by  g,  then  for 

soae  natural  nuaber  m,  k  and  for  n  greater  than 
or  equal  to  k,  absolute  value  of  f (n)  will  be 
less  than  or  equal  to  the  product  of  the  a  and 
g  (n)  #  i.e.: 

a*|g (n) |  >-  Jf (n) | 

If  we  aultiply  both  sides  of  this  inequality  by 
an  arbitrary  real  nuaber  c  the  inequality  reaains 
the  saae.  Now  let  a*c  be  equal  to  real  nuaber  z 
then  we  rewrite  the  inequality  as  below: 

**|g(n)  |  >■  c*|f  (n)  | 

where  z  >*c.  So  by  definition  9,  c*f(n)  is  0(g). 


3.  Suppose  £  and  r  aca  both  0(g) ,  than  thara  exists 
natural  nuabers*  q $  a ,  x,  x  such  that 
q*|g(n)  |>-|f  (n)  I  in  the  case  n  is  greater  than  or 
equal  to  a,  s*  |g  (n)  |  >«|  r  (n)  |  in  the  case  n  is 
greater  than  or  equal  to  x.  Mow  assuae  Q-q+z  and 
G*a  (where  a>x) .  Then  we  write: 

q*|g(n)  |+z*jg(n)  |>*|r  (n)  |+|f  (n)  |»|£(n)+r(n)  | 
or 

Q*|g(n)|  >-  (r+f)(n) 

So  r+f  is  0(g) . 

Me  usually  represent  the  tiae  consuaed  by  an 
algoritha  by  a  coaplexity  function#  say  g,  then  0(g)  is 
called  the  asyaptotical  tiae  coaplexity  behaviour  of  the 
algoritha.  Mote  that  the  functions  that  have  the  saae 
asyaptotical  tiae  coaplexity  behaviour  may  not  cost  us  the 
saae.  Suppose  the  coaplexity  function  of  an  algoritha  is  the 
integer  aultiple  of  the  tiae  coaplexity  function  of  another 
algoritha  and  suppose  both  algorithas  have  the  saae 
asyaptotical  tiae  coaplexity  behaviour.  Clearly  the  first 
algoritha  is  aore  expensive  than  the  other  but  they  have  the 
saae  asyaptotical  tiae  coaplexity  behaviour.  So  while  using 
the  asyaptotical  tiae  coaplexity  behaviour  as  the  aeasure  we 
have  to  be  careful.  In  order  to  aake  this  fact  clear  we  will 
give  an  exaaple. 


Example : 

Suppose  two  algorithms  0  and  E  have  the  complexity 
functions  g  and  h  respectively  and  let  these  complexity 
functions  be , 

g  -  K*m  +  cl 
h  -  L*(m2)  +  C2 
where  K  *  40 *L  (constants) 

Cl  ■  C2  (constants) 

Then  for  m  *<  40  the  algorithm  B  is  less  costly  than  the 
algorithm  D  and  for  m>40  the  algorithm  D  is  less  costly  than 
the  algorithm  B  but  the  complexity  function  of  algorithm  D 
asymptotically  dominates  the  complexity  function  of  the 
algorithm  B.  So  if  we  choose  the  algorithm  D  by  only  looking 
at  its  asymptotical  time  complexity  behaviour  and  if  the  "m" 
does  not  take  on  values  greater  than  40,  we  would  lose  time 
instead  of  saving  time. 

From  this  point  on  we  will  use  the  order  notation  in 
which  the  explicity  specification  of  the  function  is  written 
in  the  parenthesis  rather  than  the  name  of  the  function,  so 
that  0(n2)  denotes  the  set  of  functions  that  are 
asymptotically  dominated  by  f(n)  ■  (n2). 

We  can  write  the  classes  of  different  complexity  in 
order  of  increasing  complexity  as  follows: 

0(c)G>(log  n)<$(n)<§(n(log  n)  )^(n2)<$(c*»)<p(nl) 


Definition  11:  If  poly  (a)  is  a  polynomial  of  degree 


a,  then  poly  (a)  is  0(a8). 

2.  The  Bxtsnsional  Representations  of  Relations 

There  are  several  representation  techniques  for 
representing  relations.  Aaong  them,  incidence  aatrix  (or 
adjacency  matrix),  adjacency  list  [Ref.  5]  and  table 
representations  [Ref. 2]  are  the  most  coaaon  ones.  There 
exists  other  representation  techniques  which  are  inherently 
the  same  as  the  techniques  given  above. 

We  can  define  the  incidence  aatrix  of  a  relation  as 
follows:  Let  R  be  a  relation  with  a  tuples  where  a>*l.  The 
incidence  aatrix  of  relation  R  is  a  2-diaensional  aZa  array# 
say  M#  with  the  property  that  M[j#k]«l  if  and  only  if  the 
tuple  <Aj,Ak>  is  in  relation  R#  where  the  individual  Ai 
belongs  to  the  codoaain  and  the  individual  Aj  belongs  to  the 
domain  of  the  relation  R. 

Proa  an  incidence  aatrix  one  can  readily  determine  if 
a  tuple  is  in  the  relation  in  question.  In  general  the 
algorithms  that  work  on  the  incidence  aatrix  representation 
of  relations  have  0(n2)  tiae  complexity  behaviour#  and  if  the 
incidence  aatrix  of  a  relation  is  sparse  the  space 
utilization  is  not  efficient.  We  will  discuss  this  issue  in 
detail  later  in  the  storage  coaplexity  analysis  of  the 
extensional  representation  techniques.  A  sample  incidence 
aatrix  is  shown  in  Figure  1. 


In  our  system  we  will  use  a  representation  technique 
which  is  very  similar  to  the  incidence  matrix  representation, 
namely  Hash- Incidence-Vector  representation.  We  can  define 
the  incidence  vector  as  follows:  Let  R  be  a  relation  with  p 
tuples  where  p>*l,  the  incidence  vector  of  relation  R  is  a 
bit  vector,  say  B,  with  the  property  that,  if  there  are  n 
distinct  individuals  in  the  domain  of  relation  R  and  if  the 
i'th  individual  of  the  domain  and  the  k'th  individual  of  the 
codomain  are  in  relation  with  each  other  under  the  relation 
R,  the  (k-1)  *n+i*J'th  bit  of  B  is  1  otherwise  it  is  0.  Let 
the  cardinality  of  the  domain  be  n  and  the  cardinality  of  the 
codomain  be  s,  then  the  length  of  B  is  equal  to  (n*s)  or  in 
terms  of  number  of  memory  locations  it  is  equal  to 
(ceiling  ((n*s)/C))  where  C  is  the  memory  word  length.  In  the 
hash  incidence  vector  representation  the  domain  (RIM  set)  and 
codomain  (LEM  set)  individuals  are  represented  by  records  in 
the  separate  hash  tables,  i.e.,  the  LEM  set  individuals  are 
represented  by  the  records  in  a  hash  table  called  Left  Hash 
Table  and  the  RIM  set  individuals  are  represented  in  a  hash 
table  so  called  Right  Hash  Table  (RHT).  The  records  of  the 
LEM  set  individuals  are  further  linked  to  each  other 
establishing  a  linked  list  structure  in  the  LHT  and  similarly 
for  the  RIM  set  individuals.  Each  record  mentioned  above  has 


a  field  in  which  an  integer  to  be  used  in  index  computation 
is  stored.  The  Hash-Incidence-Vector  for  the  relation  W  is 
shown  in  Figure  2. 


In  the  adjacency  list  representation  the  m  rows  of 
the  incidence  matrix  are  represented  as  m  linked  lists. 
There  is  one  list  for  each  domain  individual.  The  nodes  in 
list  i  represent  the  individuals  that  are  in  relation  with 
the  individual  i  of  the  domain  set.  Bach  node  has  at  least 
two  fields,  one  of  these  fields  represents  t^e  individual 
that  is  in  relation  with  the  i'th  individual  of  domain  and 
second  field  being  a  link  field  is  used  to  construct  the 
linked  list  structure.  A  sample  configuration  of  an 
adjacency  list  is  shown  in  Figure  3. 

The  table  representation  of  a  relation  is  the 
simplest  representation  technique  with  respect  to  others.  We 
can  define  this  representation  technique  as  follows:  Let  R 
be  a  relation  with  m  tuples,  the  table  representation  of  R  is 
a  2Xm  array,  say  N,  with  the  property  that  M[k,l]*Ai  and 
M[k,2]*Aj  where  Ai  and  Aj  are  the  individuals  of  the  k'th 
tuple  (<Ai,Aj>)  of  the  relation  R  and  Ai  belongs  to  the 
codomain  and  Aj  belongs  to  the  domain  of  the  relation  R.* 

‘Because  our  relational  operations  are  defined  by  usi^g 
the  notations  used  by  Russel  and  Whitehead  in  PrinciPia 
Mathematics  to  <56,  the  order  of  domain  and  codomain  will 
(unussually)  5eT  reversed. 


List  Representation  £or  Relation  w 


Sometimes  it  is  beneficial  to  represent  the  table  of 
a  relation  as  a  linked  list  of  records  rather  than  an  array. 
Bach  record  of  the  linked  list  represents  a  tuple  of  the 
relation  in  question.  A  relation  represented  in  this  kind  of 
representation  is  shown  in  Pigure  4. 

3.  The  Bxtensional  Representations  of  Sets 

One  representation  of  sets  is  to  represent  the 
members  of  a  set  in  a  binary  tree  structure  in  which  the 
members  are  represented  as  nodes.  An  alternative  to  the  tree 
representation  is  based  on  the  hash  coding,  in  which  the 
members  of  a  set  are  stored  in  a  hash  table.  In  this 
representation  the  storage  usage  is  poor  because  in  some 
instances  we  may  have  a  lot  of  unused  hash  table  entries.  In 
addition  we  need  additional  links  that  thread  the  records 
and/or  hash  table  entires  corresponding  to  set  members,  in 
order  to  do  set  operations.  On  the  other  hand  in  this 
representation  the  membership  test  operation  becomes  constant 
time. 

Another  representation  technique  is  a  bit  vector 
representation  of  sets.  In  this  representation  technique  a 
linearly  ordered  set,  the  so  called  Universal  set  (from  which 
all  sets  are  created)  is  represented  in  an  array  or  linked 
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list  structure  and  all  the  other  sets  that  are  subsets  of 
this  set  are  represented  by  bit  vectors.  So  a  subset  C  of 
the  universal  set  UNIV  is  represented  as  a  bit  vector  of  a 
list  where  a  is  the  cardinality  of  the  universal  set.  Let's 
say  the  bit  vector  representing  the  set  C  is  A;  the  K'th  bit 
of  the  A  is  1  if  and  only  if  the  K'th  individual  of  UNIV  is 
the  aeaber  of  the  set  C.  This  representation  has  aany 
advantages.  First  of  all/  the  aeabership  test  becoaes  very 
easy.  Secondly/  the  set  operations  can  be  done  by  using  fast 
logical  operations  (and/  or/  union/  not)  on  bit  vectors. 
Further  in  the  case  the  bit  vectors  are  not  largely  sparse/ 
the  space  utilisation  is  efficient  with  respect  to  the  other 
representation  techniques. 

The  last  representation  technique/  and  the  aost 
coaaon  one/  is  the  list  representation.  In  this 
representation  the  aaount  of  aeaory  needed  is  proportional  to 
the  cardinality  of  the  set  being  represented,  and  there 
exists  linear  tiae  algor ithas  for  doing  set  operations.  In  a 
practical  sense  soae  algor ithas  are  slightly  expensive  even 
though  their  tiae  coaplexity  behaviour  is  linear.  For 
exaaple,  the  union  and  intersection  operations  require  tiae 
proportional  to  the  sua  of  the  cardinalities  of  the  operand 


sets 


Most  of  the  tine/  it  is  beneficial  to  use  a  dynamic  data 
structure  rather  than  a  static  one  in  this  representation 
technique.  The  linked  list  representation  of  a  set  is  shown 


in  Pigure  5. 

B.  THE  STRUCTURE  07  THE  SYSTEM 

In  this  chapter  we  will  describe  the  extensional 
representation  system  that  we  propose  and  will  discuss  the 
various  properties  of  the  system. 

1.  Consideration  in  Selecting  a  Set  Representation 

We  have  described  the  representation  techniques  for 
sets  before;  we  will  use  the  features  of  the  list 
representation  and  the  hash  representation  rather  than  one  of 
the  other  representations.  The  prime  reason  for  doing  this 
is,  our  relational  operations  produce  a  significant  number  of 
intermediate  sets  and  we  have  to  represent  the  sets  so  that 
we  use  as  large  portion  of  the  memory  as  needed.  So  we  can 
not  use  the  hash  representation  by  associating  a  hash  table 
with  each  set  created.  But  also  we  would  not  want  to  lose 
the  constant  time  membership  test  opportunity,  so  we  will  use 
a  system  wide  hash  table  in  which  each  individual  of  a  set  is 
represented  as  a  record  which  is  connected  to  the  hash  table 
entry  which  this  individual  hashes  into.  Furthermore,  the 
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Figure  5.  The  Linked 


individuals  that  belong  to  the  saae  sat  are  linked  to  each 
other  in  a  linked  list  structure.  This  hash  table  will  be 
explained  further,  later  in  this  section.  So  the  resulting 
representation  technique  has  features  froa  both  the  hash 
representation  and  the  list  representation. 

We  need  to  aaintain  the  sets  as  linked  lists  in  this 
hash  table  in  order  to  be  able  to  keep  track  of  the 
individuals  in  a  set  and  aany  relational  operations  are 
required  to  exaaine  all  the  aeabers  of  a  set.  Why  don't  we 
select  the  tree  representation?  Because  the  aeabership  test 
and  insertion  is  aore  costly  than  the  hash  representation. 
In  addition,  since  the  relational  operations  will  produce  a 
lot  of  sets,  it  would  be  costly  to  execute  an  O(log(n))  tiae 
algoritha  to  insert  in  the  right  place  in  the  tree 
constructed  so  far  the  record  for  each  individual  of  a  set 
being  produced  by  a  relational  operation. 

2.  Space  Considerations  for  Selecting  Relation  Repre¬ 
sentation 

In  this  section  we  intend  to  discuss  the  space 
requireaents  of  various  relation  representation  techniques. 
We  will  aainly  focus  on  three  representation  techniques: 


a.  Bash-Incidancs-Vactor  raprasantation. 

b.  Tabla  raprasantation. 

c.  Adjacancy  List  raprasantation. 

Thasa  raprasantation  tachniquas  hava  baan  axplainad  in  tha 
pravious  sac t ion. 

According  to  tha  currant  dafinition  of  tha  oparations 
in  our  ralational  languaga  thara  will  ba  a  any  rafarancas  to 
tha  various  ralations  par  haps  avan  in  a  ona  lina  of  tha 
prograa*  so  if  wa  conaidar  tha  dansity  of  thasa  rafarancas  wa 
would  not  want  tha  undar  lying  aaaory  aanagaaant  systaa  to 
accass  to  tha  disk  aost  of  tha  tiaa.  This  iaplias  that  tha 
storaga  raquiraaants  of  a  particular  raprasantation  tachniqua 
bacoaa  vary  critical.  So  avan  though  our  aain  intant  is  to 
analyxa  tha  tiaa  raquiraaants  iapliad  by  tha  raprasantation 
tachniquas  on  tha  algorithas  of  tha  ralational  oparationsr  wa 
do  not  want  to  salact  thosa  raprasantation  tachniquas  that 
ara  in  tha  first  placa  infaasibla  in  tha  spaca 
conaidar ations . 

a.  Storaga  Raquiraaants  of  tha  Incidanca  Vactor 
Rapr asantation : 

As  wa  axplainad  bafora,  thara  axist  (a*n)  antrias 
of  tha  incidanca  vactor  for  raprasanting  a  ralation  which  has 


a  LBN  sat  with  tha  cardinality  "n"#  and  a  RIM  sat  with  tha 
cardinality  "n".  However#  wa  packed  this  incidence  vector 
into  (a*n)/C  aaaory  locations  where  *C*  is  tha  aaaory  word 
length.  Tha  conditions  under  which  aaxiaua  storage  wasting 
occurs  is  tha  first  question  wa  should  ask  ourselves.  In  the 
worst  case  the  cardinality  of  the  LEM  set  and  the  cardinality 
of  the  RIM  set  becoae  equal#  let's  say  ”n".  In  this  case 
there  should  be  at  least  *n*  l's  in  the  incidence  vector  and 
(n2)-n  o*s#  so  the  overhead  is: 

(n2)  -  n  bits. 

Me  said  there  should  be  at  least  "n"  l's  in  the  incidence 
vector#  because  every  L8M  set  individual  is  in  relation  with 
at  least  one  RIM  set  individual  and  analogously  every  RIM  set 
individual  is  in  relation  with  at  least  one  LBM  set 
individual.  If  this  weren't  the  case#  we  would  ask  the 
question#  How  did  that  individual  cone  to  be  inserted  into 
the  RIM  (or  LBM)  set  of  the  relation  if  it  is  not  in  relation 
with  any  individual  in  the  LBM  (or  RIM)  set.  This  can  not 
occur#  since#  in  the  creation  of  the  relation  we  put  those 
individuals  in  the  LBM  set  of  the  relation  which  are  in 
relation  with  at  least  one  RIM  set  individual#  and  in  the 
sane  nanner  we  put  those  individuals  in  the  RIM  set  of  the 


relation  that  ara  in  relation  with  at  least  one  LEM  set 
individual  and  we  create  the  incidence  vector  according  to 
the  cardinalities  of  the  LBN  set  and  the  RIM  set  of  the 
relation.  So  in  general  the  incidence  vector  of  a  relation 
can  not  contain  less  than  *k”  l's  where: 
k  -  aax(a*n) 

a  ■  The  cardinality  of  the  LBN  set  of  the 
relation. 

n  *  The  cardinality  of  the  RIM  set  of  the 
relation. 

So  in  general  we  coapute  the  overhead  in  the  worst  case  by 
using  the  foraula  below: 

nuaber  of  unused  bits  •  a*n  -  aax(a*n) 

In  fact  we  can  not  consider  the  relation  represented  by  the 
incidence  vector  alone*  since  we  are  actually  representing 
the  LBN  and  the  RIM  set  of  the  relation  along  with  the 
incidence  vector  in  our  Hash-Incidence-Vector  representation. 
Hence  the  nuaber  of  fields  of  each  record  representing  a  LEM 
set  or  a  RIM  set  individual  should  be  taken  into  account.  In 
addition*  the  relation  has  a  record  in  a  hash  table*  which  we 
will  call  the  relation  table**  and  we  have  to  add  the  space 
occupied  by  that  record  to  our  cuaulative  foraula.  We  write 

*Ne  will  explain  the  structure  of  this  table  in 
subsection  3. 


the  ip«c«  complexity  formula  for  tha  Hash-Incidence-Vector 
rapraaantation  in  terms  of  number  of  bits  required,  as 
f ollows t 

fl  •  ceiling ( (m*n)/C)  ♦  K*(m  ♦  n)  +  D 

where: 

m  *  The  cardinality  of  the  LBM  set  of  the 
relation. 

n  *  The  cardinality  of  the  RIM  set  of  the 
relation. 

K  *  The  number  of  bits  required  by  each  RIM/LBM 
set  record  in  the  RBT/LHT. 

D  •  The  number  of  bits  required  by  the  record  of 
the  relation  in  the  relation  table, 
b.  The  Storage  Requirements  for  the  Adjacency  List 
Representation 

This  representation  technique  is  very  dynamic  and 
uses  a  large  portion  of  the  memory  as  needed.  Me  will 
investigate  what  would  happen  in  the  worst  case.  In  the  worst 
case  the  relation  may  be  a  universal  relation  on  its  LBM  and 
RIM  set,  which  means  that  each  RIM  set  individual  is  in 
relation  with  all  the  LBM  set  individuals  of  the  relation. 
Therefore,  we  need  (m*n)  records  to  represent  this  kind  of 


relation.  As  we  stated  in  the  Hash-Incidence-Vector 
representation  case#  we  can  not  assuae  that  the  relation  is 
represented  by  the  Adjacency  List  alone ,  so  in  order  to  make 
a  fair  comparison  between  the  Hash-Incidence-Vector 
representation  and  Adjacency  List  representation  we  will 
assuae  that  the  RIM  set  records  of  the  relation  are 
represented  in  the  RHT  as  it  was  in  the  Hash-Incidence-Vector 
representation  case  and  that  the  records  of  the  left 
individuals  that  are  in  relation  with  one  or  more  right 
individuals  are  connected  to  the  RHT  records  of  these  right 
individuals  in  the  linked  list  structures.  In  fact,  in  this 
representation  soae  relational  operations  are  very  costly; 
for  example  in  order  to  obtain  the  LEM  set  of  the  relation 
being  represented  in  this  aanner  we  aust  trace  through  all 
the  linked  lists  of  the  kind  explained  above.  So  under  the 
tiae  considerations  we  would  want  to  represent  the  converse 
of  that  structure  in  the  LHT  also,  which  makes  some  of  the 
algorithms  simpler  than  they  otherwise  would  be.  But  let's 
assuae  we  only  assuae  the  space  requirements  and  we  did  not 
do  that.  In  the  worst  case  of  the  Adjacency  List 
representation  the  cumulative  storage  requirement  for 
representing  a  relation  is  given  in  terms  of  the  number  of 
bits  below: 


f2  ■  R*n  ♦  L*(a*n)  +  D 


where* 


a  ■  The  cardinality  of  the  LEM  set  of  the 
relation. 

n  ■  The  cardinality  of  the  RIM  set  of  the 
relation. 

K  ■  The  nuaber  of  bits  required  by  each  RIM 
record. 

L  ■  The  nuaber  of  bits  required  by  each  linked 
list  record  (Our  general  set  structure 
record) . 

0  *  The  nuaber  of  bits  required  by  the  record  of 
the  relation  in  the  relation  table, 
c.  Storage  Requireaents  for  the  Table  Representation 

The  Table  Representation  requires  aore  storage 
than  the  Adjacency  List  representation  aost  of  the  tiaer 
including  the  worst  case  when  the  relation  is  a  universal 
relation  on  its  LEM  set  and  the  RIM  set.  This  is  because  the 
adjacency  list  representation  reaoves  the  duplicates  of  the 
individuals  in  the  right  coluan  of  the  table.  In  the  table 
representation  each  tuple  of  the  relation  is  represented  as 
it  is,  and  that  causes  the  duplication  of  the  right 
individuals  and  the  left  individuals  in  the  coluans  of  the 
table  (if  we  look  at  the  linked  list  structure  of  the  table 
as  conventional  table).  The  Adjacency  list  representation 
does  not  represent  a  right  individual  in  more  than  one  place. 


We  write  the  cumulative  storage  requirements  of  the  Table 
Representation  in  the  worst  case  and  in  terms  of  number  of 
bits  as  follows * 


f 3  -  T*n*m  +  D 


where: 

n  •  The  cardinality  of  the  RIM  set  of  the 
relation. 

m  -  The  cardinality  of  the  LEM  set  of  the 
relation. 

T  •  The  number  of  bits  required  for  each  table 
record. 

D  *  The  number  of  bits  required  for  the  record  of 
the  relation  in  the  relation  table, 
d.  Comparison  of  Storage  Requirements 

Mow  we  have  to  compare  the  formulas  we  found  for 
the  various  representation  techniques.  In  fact  the  Hash- 
Incidence-Vector  representation  always  requires  the  same 
amount  of  storage  for  a  given  *n”  and  ”mN,  so  actually  we  are 
comparing  the  worst  case  requirements  of  the  other 
representation  techniques  with  the  fixed  requirement  of  the 
Hash-Incidence-Vector  representation.  Let's  subtract  ”fl” 
from  "f2”,  we  find: 

f 2-f  1  ■  L*m*n  -  R*m  -  (m*n)/C 
If  we  factor  out  the  ”m”: 

f 2-fl  -  m*(L*n  -  K*m  -  n/C) 


and  equate  tha  laft  sida  to  0,  we  obtain: 


L*n  -  K  -  n/C  *  0 


and  wa  find: 


n  -  K/(L  -  (1/0) 

This  naans  that: 

n  >  ceiling (K/(L-(1/C) ) ) 

and  in  tha  worst  case  of  tha  Adjacency  List  representation, 
Hash-Incidence-Vector  representation  is  always  better  than 
the  Adjacency  List  representation.  If  we  assume  that  the 
memory  word  length  and  the  pointers  are  16  bits  then 
according  to  our  definition  of  the  fields  of  the  records, 

K  -  80 
0-32 
C  -  16 
T  -  48 

then  n  should  be  greater  than  3.  Because  we  have  indicated 
that  the  Table  Representation  requires  more  storage  than  the 
Adjacency  List  representation,  there  is  no  need  to  do  the 
comparison  for  the  table  representation.  So  we  conclude  that 
in  the  worst  case  the  Table  representation  and  the  Adjacency 
List  representation  dominate  the  Hash-Incidence-Vector 
representation  in  storage  consideration.  In  addition  to 
that,  as  one  of  the  "m”  or  ”n”  becomes  smaller  than  the  other 
the  number  of  redundant  bits  in  the  incidence  vector 
decreases.  Suppose  n<m  so  there  are  m  l's  in  the  incidence 


vector  but  there  are  (m*n)  entries  and  (m*n)<(m*m).  On  the 
other  hand  we  can  not  expect  the  worst  case  to  occur  every 
time  under  the  practical  considerations/  so,  if  the  relation 
being  represented  is  a  bijective  function  (one  to  one,  and 
onto),  clearly  the  redundancy  in  the  incidence  vector  becomes 
maximum.  This  is  the  best  case  for  the  Table  Representation 
and  the  Adjacency  list  representation.  Let's  rewrite  the 
functions  fl,  f2  and  f3  under  this  case: 

(note  that  in  this  case  n*m) 
fl  »  2*K*n  +  (n*n)/C  +  D 
f  2  »  (K  +  L)  *n  +  D 
f 3  *  T*n  +  D 

if  we  subtract  fl  from  f2  and  equate  the  result  to  *,  and  if 
we  solve  "n"  in  the  resulting  equation,  we  find  that: 
n  »  C  *  (L  -  K) 

Hence  as  long  as  L<K  the  result  is  negative  that  shows  us  in 
that  case  the  Adjacency  List  representation  is  better  than 
the  Hash-Incidence-Vector  representation.  If  we  subtract  fl 
from  f 3  and  do  the  same  steps  we  find: 
n  -  c  *  (T  -  2*K) 

This  means  the  table  representation  is  much  better  than  the 
Hash-Incidencvj-Vector  representation  in  this  case.  Lastly, 
if  we  do  this  for  f2  and  f3,  we  can  not  find  n.  In  that  case 


we  apply  the  numerical  values  (in  our  case)  to  the  constants 
for  comparison;  if  we  substitute  the  values  given  for  L#  K 
and  T  and  compare  fl  and  f2  as  lines  having  different  slopes 
we  see  that: 

fl  «  112 *n  +  D 

f 2  -  48*n  +  D 

Therefore#  in  this  case  the  Table  representation  is  better  in 
storage  than  both  the  Hash-Incidence-Vector  representation 
and  the  Adjacency  List  representation. 

Selection  of  one  of  these  representation 
techniques  under  these  space  considerations  depends  greatly 
on  our  expectations  on  the  kind  of  relations  that  we  will  be 
working  on.  For  example  if  we  are  working  on  bijective 
functions  the  most  appropriate  representation  technique  is 
the  table  representation.  If  we  are  representing  trees  by 
using  binary  relations#  the  most  appropriate  technique  is  the 
Adjacency  List  representation.  However#  because  of  the 
nature  of  our  system  the  Hash-Incidence-Vector  representation 
becomes  attractive.  We  will  explain  this  by  giving  an 
example . 

Example : 

Suppose  we  have  a  relation  which  has  the  LEM  set 
cardinality  100  and  the  RIM  set  cardinality  200.  By 
substituting  these  values  for  the  constants  in  the  formulas 


for  fl  and  £2,.  we  find  the  storage  requirements  for  the 
Adjacency  List  representation  (in  the  worst  case  of  Adjacency 
List  representation)  to  be  82  Kbyte  and  the  storage 
requirements  for  the  Hash- Incidence-Vector  representation  to 
be  *.5  Kbyte.  In  the  best  case  of  Adjacency  List 
representation  (a  bijective  function) ,  the  storage 
requirements  for  the  Hash-Incidence-Vector  representation 
remains  the  same,  but  the  storage  requirements  for  the 
Adjacency  List  representation  drops  to  2.4  Kbyte. 

If  we  think  about  the  above  example,  in  the  best 
case  of  the  Adjacency  List  representation  we  do  not  gain 
much,  but  in  the  worst  case  we  lose  a  lot. 

In  the  analysis  of  the  algorithms  we  will  mainly 
focus  on  the  Hash-Incidence-Vector  representation,  and  the 
Table  representation.  We  will  inspect  the  relative 
efficiency  of  using  Hash-Incidence-Vector  representation 
instead  of  using  Table  representation  in  terms  of  time.  We 
will  not  look  into  the  Adjacency  list  representation,  because 
the  Hash  Incidence  Vector  representation  is  essentially  the 
same  as  the  Adjacency  List  representation  in  which  the  linked 
lists  are  represented  as  bit  strings.  This  analogy  is 
demonstrated  in  Figure  6. 


Figure  6.  The  Analogy  Between  the  Adjacency  List  Representa 
tion  and  the  Hash- Inc idence-Vec tor  Representation 


3.  System  Hash  Tables  and  Their  Structure 

The  system  consists  of  six  hash  tables  which  ace: 

a.  Relation  Table  (RT) . 

b.  Left  Hash  Table  (LHT) . 

c.  Right  Bash  Table  (RHT) . 

d.  Set  Table  (ST). 

e.  Set  Hash  Table  (SBT) . 

f.  Scratch  Hash  Table  (SCHT) . 

The  system  handles  the  collisions  by  using  the 
bucketing  technique.  In  this  technique  the  records  of  the 
individuals  that  hash  into  the  same  hash  table  entry  are 
linked  to  each  other  in  a  linked  list  structure  and  this 
linked  list  is  connected  to  the  hash  table  entry  in  question. 

a.  Relation  Table 

In  this  table  each  relation  known  by  the  system 
is  represented  by  a  record  which  is  connected  to  the  hash 
table  entry  into  which  the  identifier  of  the  relation  hashes. 
The  structure  of  the  record  is  shown  in  Figure  7. 


Figure  7.  The  Relation  Table  Record  Structure 


The  Rid  field  of  the  record  contains  the  character  string 
representing  the  relation's  identifier.  The  PFLM  field 


contains  a  pointer  which  points  at  the  first  record  of  the 
Left  Meabers  set  (LEM)  of  the  relation*  which  resides  in  the 
Left  Hash  Table.  The  PFRM  field  contains  a  pointer  to  the 
first  record  of  the  Right  Meabers  Set  (RIM)  of  the  relation* 
which  is  in  the  Right  Hash  Table  (RHT).  POOLS  is  also  a 
pointer  field*  which  contains  a  pointer  to  the  code 
(function)  representing  the  relation;  its  use  will  be 
explained  later  in  Section  III.  The  BASE  field  contains  the 
beginning  address  of  the  buffer  allocated  for  the  incidence 
vector  of  this  relation.  The  |RIM|  field  of  the  record 
contains  an  integer  which  is  the  cardinality  of  the  Right 
Meabers  Set  (RIM)  of  the  relation.  The  |  LEM|  field  contains 
an  integer  which  is  the  cardinality  of  the  Left  Members  Set 
(LEM)  of  the  relation  being  represented.  The  COLLINK  field 
contains  a  pointer  to  the  record  of  the  relation  which  has 
been  hashed  into  the  saae  hash  table  entry  as  a  result  of 
collision.  The  structure  of  the  Relation  Table  (at  one  point 
in  execution)  is  shown  in  Figure  8. 
b.  Left  Hash  Table 

This  hash  table  contains  the  records  of  the  LEM 
set  individuals  of  the  relations.  The  LEM  set  records  of  a 
relation  are  linked  to  each  other  in  a  linked  list  structure. 
The  structure  of  a  LHT  record  is  shown  in  Figure  9. 


The  Rid  field  of  the  record  contains  a  pointer  to  the 
character  string  representing  the  relation's  identifier.  The 
inclusion  of  this  field  is  necessary  in  order  to  distinguish 
the  same  individuals  of  the  LEM  sets  of  different  relations. 
The  TASB  field  contains  a  pointer  to  the  next  LEM  set 
individual's  record.  The  PML  field  contains  a  pointer  to  the 
memory  location  where  the  individual  being  represented  by 
this  record  is  stored.  The  INDEX  field  contains  an  integer 
which  will  be  used  in  computing  the  indices  of  the  Incidence 
Vector  corresponding  to  the  individual  being  represented  by 
this  record.  The  COLLINK  field  contains  a  pointer  to  the 
individual's  record  which  has  been  hashed  into  the  same  hash 
table  entry  as  a  result  of  collision.  The  PRRM  field 
contains  a  pointer  which  points  at  the  related  right  member's 
record  in  the  RBT. 

The  LHT  has  an  associated  hash  function  that  we 
will  call  "Left  Bash  Function”. 
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c.  Right  Hash  Table 


The  "Right  Hash  Table"  has  exactly  the  sane 
structure  as  the  LHT.  The  only  difference  is,  it  contains 
the  records  of  the  Right  Members  Set  individuals  of  the 
relations.  It  has  an  associated  hash  function  that  we  will 
call  "Right  Hash  Function".  Figure  10  shows  the  arrangement 
of  the  LHT,  RHT  in  combination  with  RT  at  one  point  in 
execution . 

d .  Set  Table 

This  hash  table  contains  the  records  of  the  set 
which  are  known  by  the  system.  The  record  structure  of  this 
table  is  shown  in  Figure  11. 

The  Sid  field  of  this  record  structure  contains 
the  character  string  representing  the  set  identifier.  The 
CARO  field  contains  an  integer  which  is  the  cardinality  of 
the  set  being  represented.  If  this  field  contains  >1,  then 
the  cardinality  of  the  set  has  not  been  computed.  The 
COLL INK  filed  contains  a  pointer  to  the  record  of  another  set 
(if  any)  which  has  been  hashed  into  the  same  hash  table  entry 
as  a  result  of  collision.  The  PSS  field  contains  a  pointer 
which  points  at  the  first  individual's  record  of  the  set  in 
question.  This  record  is  the  beginning  record  of  the  linked 
list  structure  that  represents  the  set.  As  we  mentioned 
earlier  the  records  of  the  individuals  are  also  connected  to 
the  entries  of  the  Set  Hash  Table  which  will  be  explained 
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next.  The  Set  Table  has  an  associated  hash  function,  which 
we  will  call  "Set  Hash  Function".  Figure  12  shows  the 
arrangement  of  the  Set  Table  at  one  point  in  execution. 


Sid 

CARD 

PSS 

COLLINK 

Figure  11.  Set  Table  Record  Structure 
e.  Set  Hash  Table 

This  hash  table  contains  the  individuals'  records 
of  the  sets.  As  explained  before  the  records  of  the 
individuals  that  are  the  member  of  the  same  set  are  linked  to 
each  other  in  a  linked  list  structure.  The  record  structure 
is  as  shown  in  Figure  13. 

The  Sid  field  of  this  record  structure  contains  a 
pointer  to  the  character  string  which  represents  the  set  to 
which  the  individual  being  represented  by  this  record 
belongs.  Inclusion  of  this  field  is  necessary  in  order  to 
distinguish  the  same  individuals  of  different  sets.  The  PML 
field  contains  a  pointer  to  the  memory  location  where  the 
individual  being  represented  by  this  record  is  stored.  The 
TASK  field  is  another  link  field  which  contains  a  pointer  to 
the  next  record  of  the  linked  list  structure  of  the  set.  The 
COLLINK  filed,  as  it  was  before,  contains  a  pointer  which 
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go«s  into  the  collision  chain.  Figure  14  shows  the 
arrangement  of  the  Set  Bash  Table  at  one  point  in  execution. 


Sid 

PML 

TASE 

COLLINK 

Figure  13.  The  SHT  Record  Structure 


f.  Scratch  Hash  Table 

The  Scratch  Hash  Table  is  exactly  the  same  as  the 
Set  Hash  Table  and  is  used  to  store  teaporary  sets  during 
relational  operations.  After  the  operation  terminates,  the 
records  of  this  table  are  disposed,  we  could  use  the  Set 
Hash  Table  for  this  purpose,  but  doing  pointer  updates  in 
such  a  crowded  table  becoaes  very  coaplex.  In  addition  the 
SCHT  need  not  be  as  large  as  the  other  hash  tables  since  it 
is  used  for  only  one  operation  and  is  cleaned  up  for  a 
subsequent  operation.  So  the  record  density  in  this  table 
will  be  very  low  and  the  possibility  of  collisions  decreases. 
It  is  the  impleaenter 's  decision  to  continue  to  use  the  Set 
Hash  Table  for  this  purpose  or  not.  Note  that  if  the  SHT  is 
used  for  this  purpose,  the  creation  of  teaporary  set 
identifiers  (for  the  temporary  sets)  becomes  necessary. 

4.  Bash  Functions 

As  we  mentioned  before,  each  system  hash  table  has  an 
associated  hash  function.  We  will  assume  that  the  reader  is 
already  familiar  with  Hash  Coding  and  collision  handling 
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Figure  14.  The  Arrangement  of  the  SHT  at  One  Point  in  Execu 
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techniques.  In  addition,  we  will  not  go  into  the  detail  of 
the  hash  functions;  i.e.,  we  will  not  consider  how  the  index 
of  a  hash  table  entry  is  computed  corresponding  to  an 
identifier  since  this  is  an  implementation  issue. 

The  hash  function  associated  with  the  Relation  Table 
(RT)  takes  a  relation  identifier  and  maps  it  into  the  index 
of  a  Relation  Table  entry.  The  hash  function  associated  with 
the  Set  Table  (ST)  takes  a  set  identifier  and  maps  it  into  a 
Set  Table  entry.  The  hash  function  associated  with  the  SCHT 
takes  an  individual  and  maps  it  into  a  SCHT  entry  and  so  on. 

The  hash  functions  associated  with  the  LHT,  RHT,  and 
SHT  have  slightly  different  properties.  The  hash  function 
asscciated  with  the  LHT  takes  the  individual  and  the  relation 
identifier  (which  identifies  the  relation  that  the  individual 
in  question  belongs  to)  and  concatenates  them;  then  it  maps 
the  resulting  identifier  to  a  LHT  entry.  This  is  done  in 
order  to  have  a  better  distribution  in  the  LHT.  The  hash 
function  associated  with  the  RHT  has  the  same  properties  as 
the  hash  function  associated  with  RHT,  only  the  table  we  are 
hashing  is  the  RHT  instead  of  LHT.  The  hash  function 
associated  with  the  SHT  takes  the  individual  and  the 
identifier  of  the  set  which  this  individual  belongs  to  and 
concatenates  them;  then  it  maps  the  resulting  identifier  to 
an  SHT  entry. 
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Each  hash  function  mentioned  above  tries  to  find  the 
record  of  the  individual  in  the  collision  bucket.  (If  any 
collision  occurred  before ,  there  will  be  a  collision  bucket 
connected  to  the  hash  table  entry  to  which  the  hash  function 
mapped  us.;  If  it  finds  the  record  of  the  individual,  it 
returns  the  pointer  to  the  record  of  this  individual; 
otherwise,  it  returns  the  pointer  to  the  last  record  of  the 
collision  bucket  or  the  hash  table  entry  found  (if  there  is 
no  individual  connected  to  this  hash  table  entry).  Of 
course,  it  will  inform  the  caller  about  the  kind  of  pointer 
returned. 

In  our  system  the  identifier  of  the  individual  is  the 
individual  itself;  hashing  functions  view  the  individuals  as 
bit  strings  and  compute  the  indices  of  the  hash  table  entries 
by  using  these  bit  strings.  Another  important  property  of 
our  system  is  the  individuals  of  a  relation  or  a  set  may  be 
of  different  types.  For  example  a  set  may  contain  a  relation 
or  another  set  as  a  member.  So  in  our  system  the  relations, 
sets,  integers,  characters,  character  strings,  bit  strings, 
reals,  etc.  are  all  individuals.  This  type  independency  is 
achieved  by  maintaining  the  pointers  to  the  memory  locations 
where  the  individuals  are  actually  stored,  in  the  data  fields 
of  the  hash  tables'  records,  rather  than  the  individuals 
themselves. 


« 
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5.  Referencing  the  Incidence  Vector 

Before  we  explain  the  way  we  reference  the  incidence 
vector  of  a  relation,  we  will  explain  how  we  arrange  the 
integers  stored  in  the  index  fields  of  the  LEM  set  and  RIM 
set  records  of  a  given  relation.  Given  a  relation,  the 
integers  associated  with  the  LEM  set  individuals  begin  with  1 
and  increase  by  K  where  K  is  the  cardinality  of  the  RIM  set 
of  the  relation;  i.e.,  if  the  cardinality  of  the  RIM  set  of 
the  relation  is  3  and  the  cardinality  of  the  LEM  set  of  the 
relation  is  2,  the  integer  stored  in  the  index  field  of  the 
first  LEM  set  individual's  record  will  be  1,  and  the  integer 
stored  in  the  index  field  of  the  second  LEM  set  individual's 
record  will  be  4,  and  so  on.  The  integers  stored  in  the 
index  fields  of  the  RIM  set  records  begin  with  1  and  increase 
by  1;  i.e,,  in  the  above  example  the  integer  stored  in  the 
index  field  of  the  first  RIM  set  individual's  record  will  be 
1,  the  integer  stored  in  the  index  field  of  the  second  RIM 
set  Individual's  record  will  be  2,  and  so  on.  The  beginning 
address  of  the  incidence  vector  of  the  relation  is  stored  in 
the  BASE  field  of  this  relation’s  record  in  the  Relation 


Table  (RT). 


Now  we  have  to  explain  how  we  reference  the  incidence 
vector  of  a  relation.  Suppose  we  are  given  a  tuple  and  a 
relation  The  question  is  whether  this  tuple  is  in  the  given 
relation  or  not.  Me  first  hash  with  the  right  component 
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individual  to  the  RHT  and  find  its  record,  then  we  hash  with 
the  left  component  individual  of  the  tuple  to  the  LHT  and 
find  its  record.  We  extract  the  integers  stored  in  the  INDEX 
fields  of  these  records  and  add  them  up;  then  we  subtract  1 
from  the  result  and  obtain  the  INDEX  of  the  incidence  vector 
entry  corresponding  to  this  tuple.  Let's  call  the  resulting 
INDEX,  "K".  (Of  course,  if  we  can  not  find  records  for  the 
individuals  above,  the  question  can  be  answered  immediately.) 
In  the  next  step  we  extract  the  beginning  address  of  the 
incidence  vector  from  the  record  of  the  relation.  Let's  call 
this  address  BASE.  Then  we  call  the  algorithm  below  with  the 
BASE  and  K  being  the  arguments.  Algorithm  reference  (K, 
BASE) : 

1.  Offset  *  ceiling (K/C) . 

2.  Location  *  offset  +  BASE  -  1. 

3.  Petch  the  contents  of  the  memory  location  by 
using  the  address  computed  in  step  2. 

4.  h  ■  K  -  (offset*C)  +  1. 

5.  Extract  the  h'th  bit  from  right  and  test  it.  If 
it  is  1  return  true,  else  return  false. 

In  the  above  algorithm  C  is  the  memory  word  length,  "off set" 
is  a  variable  of  type  integer,  "location”  is  a  pointer 
variable,  "h”  is  a  variable  of  type  integer.  We  needed  to  do 
the  above  computations  because  we  pack  the  n  bits  of  the 


incidence  vector  into  ceiling (n/C)  aeaory  locations#  where  C 
is  the  aeaory  word  length. 


6.  Table  Representation 

Another  representation  technique  that  we  will  be 
focusing  on  is  the  table  representation.  We  will  represent 
the  table  of  a  relation  as  a  linked  list  of  records  in  which 
each  record  represents  a  tuple  of  the  relation.  The  record 
structure  is  as  shown  in  Pigure  15. 


LEFT 


RIGHT 


LINK 


Pigure  15.  The  Structure  of  the  Table  Records 


The  LEFT  field  contains  a  pointer  to  the  aeaory  location 
where  the  left  coaponent  of  the  tuple  (which  is  an 
individual)  is  stored.  The  RIGHT  field  contains  a  pointer  to 
the  memory  location  where  the  right  component  of  the  tuple  is 
stored.  The  left  component  of  the  tuple  belongs  to  the  Left 
Members  Set  of  the  relation  (or  in  other  words  Codomain  of 
the  relation)  and  the  right  component  of  the  tuple  belongs  to 
the  Right  Members  Set  of  the  relation  (or  in  other  words  the 
Domain  of  the  relation).  The  LINK  field  contains  a  pointer 
to  the  record  which  represents  the  next  tuple  of  the 
relation. 


We  will  not  define  a  complete  environment  for  the 
table  representation.  Of  course  the  environment  defined  for 
the  Hash-Incidence-Vector  representation  (i.e.,  The  RT,  ST, 
SHT)  could  be  used  in  this  case  too. 

7.  About  the  Algorithms 

We  will  write  our  algorithms  in  English  step  by  step. 
In  the  time  complexity  analysis  we  will  refer  to  the  steps  of 
the  algorithms  and  associate  the  terms  of  the  complexity 
functions  with  the  steps.  The  comments  will  be  written  in 
parenthesis  between  the  steps  of  the  algorithms.  Sometimes 
we  will  insert  loops  as  steps  into  the  algorithms,  which  are 
written  in  a  PASCAL-like  algorithmic  language.  This  is  done 
to  make  the  algorithm  clear  to  the  reader. 

In  the  time  complexity  functions  we  will  use  the 
capital  letters  to  represent  the  constants  and  the  small 
letters  to  represent  the  variables.  Even  though  we  will  be 
inspecting  the  worst  case  asymptotical  time  complexity 
behaviour  of  the  algorithms,  and  constants  do  not  affect  the 
asymptotical  time  complexity  behaviour  of  the  algorithms,  we 
will  provide  the  complexity  functions  of  the  extensional 
algorithms  with  the  predicted  explicit  constants  in  Appendix 
B.  in  predicting  these  constants  we  will  make  some 
assumptions.  For  example  hashing  to  a  hash  table  requires  10 
memory  references  to  be  made.  Even  though  we  will  not  define 
an  explicit  environment  for  the  Table  representation  we  will 
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assume  we  use  the  sane  environment  that  we  will  use  for  the 
Hash-Incidence-Vector  representation  in  predicting  those 
constants*  This  is  necessary  in  order  to  do  a  fair 
comparison  between  the  Table  representation  and  Bash- 
Incidence-  Vector  representation. 


II.  ANALYSIS  OF  BXTBNS IONAL  ALGORITHMS 

In  this  section  we  will  define  some  of  the  relational 
operations'  algorithms  that  work  on  the  extensional 
representation  structures  and  we  will  determine  the  worst 
case  asymptotical  time  complexity  behaviour  of  these 
algorithms. 

The  reader  can  find  the  analysis  of  the  remaining 
relational  operations'  algorithms  in  Appendix  A. 

A.  FUNCTION  APPLICATION  (F:x) 

Given  an  individual  we  want  to  apply  a  function  to  that 
individual  in  order  to  find  the  corresponding  individual  in 
the  codomain  of  the  function.  We  know  that  functions  are  in 
fact  left  univalent  relations.  This  means  given  an 
individual  in  the  domain  there  exists  a  unique  individual 
corresponding  to  that  individual  in  the  codomain  or  that  no 
individual  exists  in  the  codomain  corresponding  to  the 
individual  in  the  domain.  Now  we  have  to  state  that  fact 
more  carefully. 

Definition:  Let  A  and  B  be  sets.  A  function  f  from  A  to 
Br  denoted: 

f :A->B 


is  a  relation  from  A  to  B  such  that  for  every  acA,  there 
exists  at  most  one  be-  such  that  <a,b>«f.  In  this  case  we 
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f:a  *  b 

On  the  other  hand  in  the  relatione  case,  given  an  individual 
in  the  domain  of  the  relation  we  may  find  more  than  one 
individual  that  is  in  relation  with  that  individual  in  the 
codomain  of  the  given  relation. 

In  our  case  the  domain  is  the  RIM  set  of  the 
relation/function  and  the  codomain  is  the  LEM  set  of  the 
relation/function.  In  our  system  function  application 
operation  is  also  defined  for  relations.  This  may  seem 
dangerous  to  the  reader,  but  we  have  other  operations  such  as 
"Unit  image”  that  returns  the  set  of  individuals  in  the 
codomain  which  are  in  relation  with  the  given  individual  in 
the  domain  so  it  is  the  user's  responsibility  to  use  the 
apropriate  operation  when  he/she  is  programming.  The  reason 
for  doing  this  is  we  will  treat  function  application  in  a 
special  manner  to  make  this  operation  faster  (constant  time) 
because  the  "Function  Application"  operation  is  a  very 
frequently  used  primitive  function  of  the  system.  If  we 
check  to  detect  if  more  than  one  individual  exists  in  the 
codomain  for  the  given  individual  in  the  domain,  this 
operation  becomes  an  order  "n”  operation  in  the  hash 
incidence  vector  representation.  So  there  is  no  need  to 
accept  an  0(n)  algorithm  for  this  operation  when  there  are 
other  operations  that  serve  the  user  in  the  relations  case. 
For  example  if  the  user  wants  to  learn  the  salary  of  an 
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employee  by  applying  a  relation  (that  relates  the  salaries  to 
the  employees)  to  the  given  employee  name,  it  is  obvious  that 
the  relation  is  a  left  univalant  relation  and  he  can  do  that 
without  fear.  In  fact  there  exists  many  relations  that  are 
obviously  left  univalant  and  the  user  should  be  able  to 
perfora  this  fast  operation  on  those  relations.  In  addition 
to  that,  the  user  aay  want  to  use  this  operation  instead  of 
"Unit  Image”  operation  even  though  it  is  known  that  the 
relation  to  be  applied  is  not  left  univalant.  If  the  user  is 
not  sure  that  the  relation  in  question  is  left  univalant  or 
not,  he  should  use  the  "Unit  Iaage"  operation  to  obtain  the 
set  of  individuals  that  are  in  relation  with  the  given 
individual,  then  he/she  should  apply  the  "Unit  Class 
Selector”  operation  to  the  resulting  set.  This  operation 
calls  the  "Error  Handler”  if  the  argument  set  is  not  a 
singleton  set. 

As  we  mentioned  earlier,  because  we  use  this  operation 
very  frequently  we  have  to  reduce  the  time  complexity  of  its 
algorithm  to  constant  time.  We  do  this  by  adding  a  pointer 
field  to  the  RHT  record  structure,  namely  PRLM  (Pointer  to 
the  related  left  member).  The  pointer  in  that  field  is  set 
to  the  LHT  record  of  the  individual  which  is  in  relation  with 
the  individual  being  represented  by  the  RHT  record  in 
question.  In  the  same  manner  we  allocate  a  pointer  field  in 
the  RHT  structure  which  we  will  call  PRRM,  that  serves  the 


same  purpose.  Mots  that  v«  still  have  to  construct  the 
incidence  vector  of  a  left  univalent  relation  even  though  it 
nay  seem  unnecessary  at  first  glance?  the  reason  is  the 
converse  of  a  left  univalent  relation  is  not  necessarily  a 
left  univalent  relation  (except  in  the  case  of  bijection)  and 
in  this  case  we  have  to  treat  this  condition  in  the  converse 
operation  as  a  special  case.  In  fact  this  is  not  the  only 
reason;  many  operations  that  we  will  define  algorithms  for 
such  as  the  "Relative  Product"  and  "First  Ancestral" 
operations,  expect  the  argument  relations  to  have  incidence 
vectors.  So  rather  than  adding  this  case  as  a  special  case 
to  each  algorithm  and  constructing  the  incidence  vector  of  a 
left  univalent  relation  when  it  becomes  necessary,  we  had 
better  construct  it  the  first  time  the  relation  is  created. 
In  fact  both  solutions  have  tradeoffs.  If  we  construct  the 
incidence  vector  of  a  left  univalent  relation  the  first  time 
the  relation  is  created  and  if  we  do  not  need  that  incidence 
vector  in  any  operation  in  the  program,  we  waste  space.  On 
the  other  hand  if  we  maintain  the  code  to  construct  the 
incidence  vector  of  a  relation  in  each  operation's  program 
which  requires  that  the  operand  relations  have  their 
incidence  vectors  together  with  them,  we  waste  space  again. 
Which  of  the  solutions  is  advantageous  is  an  implementation 
decision. 


The  algorithm  for  the  Hash-Incidence-Vector 
representation  is  as  follows: 

1.  Get  the  argument  individual. 

2.  Hash  with  that  individual  into  the  RHT  under  the 
given  relation  identifier  (for  the  relation  being  applied  to 
the  given  individual). 

3.  Find  the  RIM  record  of  that  individual  in  the  RIM  set 
of  the  relation  in  question. 

4.  Follow  the  pointer  found  in  the  PRLM  field  of  that 
record  and  reach  the  record  of  the  left  individual  in 
relation  with  the  right  individual  in  question. 

5.  Follow  the  PML  field  of  the  record  found  and  extract 
the  individual  from  the  memory  location  where  it  is  saved  and 
return  it. 

The  worst  case  (also  the  average  case)  asymptotical  time 
complexity  behaviour  of  this  algorithm  is  obviously  constant 
time  (0(c)).  Because  no  matter  how  large  the  relation  is,  we 
always  make  the  same  number  of  memory  references. 

Now  we  have  to  define  the  algorithm  for  the  table 
representation.  The  algorithm  is  as  follows: 

1.  Get  the  individual. 

2.  Start  from  the  beginning  of  the  relation's  table; 
proceed  down  in  the  table  record  by  record  by  following  the 
links  between  the  records. 


3.  For  each  record  round,  compare  the  argument 
individual  with  the  individual  represented  by  the  "right” 
field  of  that  record. 

4.  The  first  time  a  match  is  found,  return  the 
individual  represented  by  the  "left"  field  of  the  current 
table  record. 

Mow  we  will  do  the  worst  case  asymptotical  time 
complexity  analysis  of  this  algorithm. 

The  worst  case  complexity  function  of  this  algorithm  can 
be  written  as: 

f  »  R*p  +  C 
where : 

p  *  Table  size/Relation  size. 

K  *  The  constant  number  of  memory  references  made  for 
each  table  record  found. 

C  *  The  constant  number  of  memory  references  made  by  the 
housekeeping  operations.  (In  this  case  C  is  very 
small  because  there  is  no  need  to  update  any  global 
table . ) 

In  the  worst  case  the  relation  may  be  a  universal  relation  on 
its  LBM  and  RIM  sets,  or  in  other  words  the  relation  may  be 
equal  to  the  cartesian  product  of  its  LEM  and  RIM  set.  By 


assuming  the  LEM  and  the  RIM  sets  have  the  common  cardinality 
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V,  we  can  substitute: 

n*n 

in  place  of  ”p”  in  the  above  function.  So  the  complexity 
function  becomes: 

f  »  R*(n2)  +  c 

So  by  looking  at  the  exponent  of  the  term  with  the  larger 
exponent #  we  conclude  that  the  above  polynomial  has  the 
asymptotical  behaviour  of  order  2#  and  in  turn  we  conclude 
that  the  algorithm  has  the  worst  case  asymptotical  time 
complexity  behaviour  of  0(n2). 

B.  UNIT  IMAGE  ( (unimg :R) :x) 

This  operation#  given  an  individual  in  the  domain  (RIM 
set)  of  the  relation#  returns  the  set  of  individuals  that  are 
in  relation  with  the  given  individual  in  the  codomain  (LEM 
set)  of  the  relation. 

The  algorithm  for  the  Hash-Incidence-Vector 
representation  is  as  follows: 

1.  Get  the  argument  individual. 

2.  Get  the  relation  identifier#  hash  with  that  relation 
identifier  into  the  relation  table  and  find  the  record  of  the 
relation  in  the  relation  table.  Pollow  the  pointer  found  in 
the  PFLM  field  of  that  record#  and  find  the  first  left 
member's  record  in  the  LQ4  set  of  the  relation. 
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3.  Hash  with  the  argument  individual  into  the  RHT  under 
the  given  relation  identifier,  find  the  RHT  record  of  the 
argument  individual  and  extract  the  contents  of  the  "index" 
field  of  that  record.  Record  the  resulting  integer  in  the 
temporary  variable  "tempi". 

4.  Start  from  the  beginning  of  the  LEM  set  of  the 
relation  (the  first  record  is  found  in  step  2),  and  proceed 
down  in  the  LEM  set  record  by  record  by  following  the  TASE 
links  between  the  records.  For  each  record  found  in  this 
manner  extract  the  contents  of  the  index  field,  reference  the 
incidence  vector  of  the  relation  with  this  index  and  the 
index  stored  in  the  variable  "tempi"  by  using  the  "reference 
algorithm".  If  a  1  is  found  in  the  corresponding  incidence 
vector  location  then,  hash  into  the  SHT  with  the  current  left 
individual  under  the  set  identifier  which  will  be  described 
in  step  5,  and  establish  a  set  record.  Copy  the  PML  field  of 
the  current  LEM  set  record  into  the  PML  field  of  that  record. 
If  this  is  the  first  set  record  created,  mark  it  with  pointer 
"P".  Link  the  set  records  created  in  this  manner  to  each 
other  by  their  TASE  links.  Keep  a  count  beginning  with  0  and 
increment  it  for  each  set  record  created,  with  0. 

5.  Hash  to  the  set  table  (ST)  with  the  identifier  of  the 
resulting  set,  which  is: 

"u$img$"  (relation's  identifier)  (individual's  identifier) 


Establish  the  record  o£  this  set*  put  the  pointer  "P"  into 
the  "PSS"  field  and  put  the  last  value  of  the  count  into  the 
"CARD"  field  of  that  record. 

We  write  the  worst  case  complexity  function  of  that 
algorithm  as  follows: 

f  -  K*n  +  C 
where: 

n  •  The  cardinality  of  the  LEM  set  of  the  given  relation. 

K  ■  Constant  number  of  memory  references  made  for  each 
LHT  record  found  while  proceeding  in  the  LEM  set  of 
the  relation  in  Step  4. 

C  ■  Constant  number  of  memory  references  made  in  steps  1# 
2,  3  and  5. 

By  looking  at  the  exponent  of  the  term  with  the  larger 
exponent  we  conclude  that  the  worst  case  asymptotical  time 
complexity  behaviour  of  this  algorithm  is  0{n),  where  n  is 
the  cardinality  of  the  LEM  set  of  the  relation. 

Now  we  have  to  define  the  algorithm  for  table 
representation.  The  algorithm  is  as  follows: 

1.  Get  the  argument  individual. 

2.  Start  from  the  beginning  of  the  relation's  table, 
proceed  down  in  the  table  record  by  record,  by  following  the 
links  between  the  records.  For  each  record  found  in  this 
manner,  compare  the  argument  individual  with  the  individual 
represented  by  the  "right”  field  of  that  record.  If  a  match 
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is  found/  hash  into  the  SET  with  the  current  left  individual 
under  the  set  identifier  described  in  step  5  of  the  previous 
algorithm/  establish  a  new  set  record  and  copy  the  "left" 
field  of  the  current  table  record  to  the  PML  field  of  this 
set  record.  If  this  is  the  first  set  record  created  in  this 
manner/  mark  it  with  a  pointer  and  link  the  set  records 
created  in  this  manner  to  each  other  by  their  TASE  links. 
Keep  a  count  beginning  with  0  and  increment  it  for  each  set 
record  created. 

3.  Continue  to  do  step  2  until  the  end  of  the  table  of 
the  relation. 

4.  update  the  set  table  as  it  was  done  in  step  5  of  the 
previous  algorithm. 

Now  we  will  do  the  worst  case  asymptotical  time 
complexity  analysis  of  this  algorithm. 

We  write  the  worst  case  asymptotical  time  complexity  of 
that  algorithm  as: 

f  »K*p  +  C 
where: 

P  *  Relation  3ize. 

K  ■  Constant  number  of  memory  references  made  for  each 
table  record  found  in  step  2  of  the  algorithm. 

C  *  Constant  number  of  memory  references  made  by  the 
housekeeping  operations  (such  as  the  number  of  memory 
references  made  in  step  4) . 
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We  know  that  in  the  worst  case: 

p  ■  n*n 
where: 

n  ■  The  common  cardinality  o£  the  LEM  set  and  the  RIM  set 
o£  the  relation  in  question  (Assumption) . 

So  we  rewrite  the  complexity  function  as: 
f  »  K*(n2)  +  c 

So  by  looking  at  the  exponent  o£  the  term  with  the  larger 
exponent  we  conclude  that  the  worst  case  asymptotical  time 
complexity  behaviour  of  this  algorithm  is  0(n2). 

C.  CONVERSE  OF  A  RELATION  (Rc) 

We  formally  express  the  converse  of  a  relation  as: 

Rc  *  {<x,y>|  <y,x>  c  r} 

So  if  D  is  the  digraph  of  R,  the  digraph  of  Rc  can  be 
constructed  from  D  by  reversing  the  direction  of  all  arcs  of 

D.  This  can  be  done  in  the  table  representation  of  a 
relation  by  simply  interchanging  the  columns  of  the  table. 

Since  the  converse  of  a  relation  is  another  relation  it 
participates  in  the  relational  operations  as  the  original 
relation  does*  so  in  the  Hash- Incidence-Vector  representation 
case  we  necessarily  have  to  construct  the  incidence  vector , 
the  LEM  set#  and  the  RIM  set  of  that  resulting  relation.  The 
algorithm  for  the  Hash- Incidence-Vector  representation  is  as 
follows : 
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1.  Get  the  identifier  of  the  original  relation. 

2.  Hash  to  the  relation  table,  find  the  record  of  the 
relation,  follow  the  pointers  in  the  PFLM  and  in  the  PFRM 
fields  of  that  record  and  find  the  records  of  the  first  left 
member  and  the  first  right  member  of  the  relation 
respectively. 

3.  Extract  the  contents  of  the  |LEM|  and  the  |RIM|  field 
of  the  relations  record,  allocate  a  memory  block  as  large  as: 

(|LEM|*|RIM|)/C 

(Where  J  LEM  |  is  the  cardinality  of  the  LEM  set  of  the 
original  relation  and  |RIM|  is  the  cardinality  of  the  RIM  set 
of  the  original  relation  and  C  is  the  memory  word  length.) 

4.  Make  a  separate  copy  of  the  LEM  set  of  the  original 
relation  in  the  RHT  under  the  relation  identifier,  "Rc" 
(i.e.,  the  records  will  contain  identifier  "Rc"  in  their 
"Rid"  fields,  where  R  is  the  identifier  of  the  relation  in 
question).  Keep  a  RIM  set  index  count  and  increment  it  for 
each  record  copied;  put  the  updated  value  of  that  count  into 
the  index  field  of  the  record  created  each  time  a  record  is 
created . 

5.  Make  a  separate  copy  of  the  RIM  set  of  the  original 
relation  in  the  LHT  under  the  relation  identifier  "Rc".  Keep 
a  LEM  set  index  count  and  increment  it  by  the  cardinality  of 
the  RIM  set  of  the  original  relation.  For  each  record 


copied,  put  the  updated  value  of  the  LEM  set  index  count  into 
the  index  field  of  each  new  record  created. 

6.  Extract  the  integer  found  in  the  |RIM|  field  of  the 
original  relation's  record  in  the  relation  table  and  call  it 
K.  Start  from  the  beginning  of  the  original  relation's  LEM 
set  and  proceed  down  in  this  set  record  by  record.  For  each 
record  found  extract  the  integer  stored  in  the  INDEX  field  of 
this  record  and  call  it  L.  Call  the  original  relation's 
incidence  vector  A  and  the  new  relation's  incidence  vector  B 
and  execute  the  loop  below. 

For  j  «  L  to  L+K  by  1  do: 

B[ j  +  | RIM | ]  *  A[j] 

end-do 

7.  Hash  to  the  relation  table  with  the  new  relation 
identifier  "Re",  and  establish  the  record  for  the  new 
relation.  Copy  the  |LEM|  field  of  the  original  relation's 
record  into  the  ]  RXM  |  field  of  that  record;  in  the  same 
manner  copy  the  |RIM|  field  of  the  original  relation's  record 
into  the  | LEM |  field  of  that  record.  Put  the  beginning 
address  of  the  new  incidence  vector  into  the  BASE  field  of 
that  record,  put  the  pointers  to  the  records  of  the  first 
left  member  and  first  right  member  of  the  resulting  relation 
into  the  PFLM  and  the  PFRM  fields  of  that  record 
respectively. 


72 


Now  we  will  do  the  worst  case  asymptotical  time 
complexity  analysis  of  this  algorithm. 

We  write  the  worst  case  asymptotical  time  complexity 
function  as: 

f  -  P* (m*n)  +  L*n  +  M*m  +  C 
where: 

m  *  The  cardinality  of  the  LEM  set  of  the  original 
relation. 

n  »  The  cardinality  of  the  RIM  set  of  the  original 
relation. 

L  *  The  constant  number  of  memory  references  made  while 
copying  each  record  of  the  RIM  set. 

M  *  The  constant  number  of  memory  references  made  while 
copying  each  record  of  the  LEM  set. 

P  *  The  constant  number  of  memory  references  made  while 
copying  the  entries  of  the  original  incidence  vector 
to  the  corresponding  entries  of  the  new  incidence 
vector . 

In  the  above  function  the  first  term  corresponds  to  step  5, 
the  second  term  corresponds  to  step  4,  the  third  term 
corresponds  to  step  6,  and  the  last  term  (the  constant  C) 
corresponds  to  the  other  steps  of  the  algorithm. 


then  the  complexity  function  becomes: 

f  *  P*(n2)  +  S*n  +  C 

In  step  6  of  the  algorithm  we  had  to  make  a  number  of 
memory  references  proportional  to  the  square  of  n,  where  n  is 
assumed  to  be  the  common  cardinality  of  the  LEM  and  the  RIM 
sets  of  the  original  relation,  as  a  result  of  copying  the 
(n2)  bits  of  the  original  incidence  vector  to  the  (n2)  bits 
of  the  new  incidence  vector.  So,  by  looking  at  the  degree  of 
the  term  with  the  largest  exponent  we  conclude  that  the  worst 
case  asymptotical  time  complexity  behaviour  of  this  algorithm 
is  0(n2). 

Mow  we  have  to  consider  how  this  operation  could  be 
performed  on  the  table  representation.  Obviously  the 
algorithm  is  simpler  in  this  case.  The  algorithm  is  as 
follows: 

1.  Start  from  the  beginning  of  the  table  of  the 
relation,  and  proceed  down  in  the  table.  For  each  table 
record  found  create  a  new  table  record.  Copy  the  "left" 
field  of  the  original  record  into  the  "right”  field  of  the 
new  table  record.  In  the  same  manner,  copy  the  "right”  field 
of  the  original  record  into  the  "left”  field  of  the  new 
record.  Link  the  new  table  records  created  in  this  manner  to 
each  other  by  their  "link"  fields. 

Now  we  will  do  the  worst  case  asymptotical  time 
complexity  analysis  of  this  algorithm. 


In  the  worst  case  the  original  relation  may  be  a 
universal  relation  on  its  LEM  and  RIM  sets  Assuming  the  LEM 
set  and  the  RIM  set  cardinalities  are  equal  to  "n"  the 
relation  size  becomes  equal  to  the  square  of  ”n",  so  we  write 
the  worst  case  complexity  function  as: 

f  *  K*(n2)  ♦  C 
where: 

n  -  The  common  cardinality  of  the  LEM  and  the  RIM  set  of 
the  original  relation. 

K  ■  The  constant  number  of  memory  references  made  for 
each  table  record  of  the  original  relation  in  step  1. 

C  ■  The  number  of  memory  references  made  by  the 
housekeeping  operations  such  as  updating  the  relation 
table t  etc. 

So  we  conclude  that  the  worst  case  asymptotical  time 
complexity  behaviour  of  that  algorithm  is  0(n2) . 

D.  SET  OPERATIONS 


The  union  of  two  sets  contains  those  members  which 
are  in  either  one  of  the  two  operand  sets  or  both,  we  can 
formally  express  that  as: 

R  U  S  *  {  x  j  x«R  OR  xc5  } 

Takinq  the  union  of  two  sets  may  involve  a  lot  of 
comparisons  and  exhaustive  searches  in  the  inorderly 
structured  linked  lists  of  the  operand  sets.  Concatenating 


the  linked  lists  of  the  two  sets  and  then  removing  the 
duplicates  may  be  one  solution  but  removing  duplicates  is  a 
very  expensive  operation.  In  addition  we  have  to  preserve 
the  original  sets  while  we  are  obtaining  the  union  of  them, 
so  we  have  to  make  separate  copies  of  the  operand  set 
structures  and  perform  the  operations  on  those  copies. 

Our  solution  for  this  problem  is  to  use  the 
properties  of  the  SBT.  By  establishing  the  resulting  set  in 
the  SBT  the  duplicates  are  automatically  removed. 

The  algorithm  is  as  follows: 

1.  Hash  with  the  first  and  second  operand  set 
identifiers  to  the  set  table,  find  their  records,  follow  the 
PSS  fields  of  those  records,  and  find  the  beginning  records 
~f  the  two  operand  sets. 

2.  Start  from  the  beginning  of  the  first  set's 
linked  list  and  proceed  down  in  the  linked  list  record  by 
record.  For  each  record  found,  hash  into  the  SHT  with  the 
individual  represented  by  that  record  under  the  new  set 
identifier  "RVS".  Establish  the  record  of  that  individual  in 
the  SHT  only  if  there  is  no  record  for  that  individual  in  the 
SHT  already.  Link  the  records  of  the  individuals  in  the  SHT 
by  their  TASE  links  as  they  are  created. 

3.  Start  from  the  beginning  of  the  other  set's 
linked  list  and  proceed  down  in  the  linked  list  record  by 
record.  For  each  record,  hash  with  the  individual 


represented  by  that  record  into  the  SHT  under  the  new  set 
identifier.  Establish  its  record  in  the  SHT  if  there  is  no 
record  for  that  individual  in  the  SHT  already. 

4.  Hash  to  the  set  table  with  the  set  identifier 
”RVS*  where  R  is  the  identifier  of  the  first  operand  set  and 
S  is  the  identifier  of  the  second  operand  set.  Establish  the 
record  of  that  set  in  the  set  table  and  put  the  pointer  to 
the  linked  list  structure  established  in  the  SHT  into  the  PSS 
field  of  this  record. 

Note  that  the  order  of  the  operand  sets  is  arbitrary 
so  if  we  establish  the  record  of  the  set  under  the  identifier 
"RVS",  a  subsequent  reference  to  the  set  "SVR"  may  cause  the 
same  set  to  be  reconstructed  again.  Of  course  we  do  not  want 
that,  so  we  have  to  accept  a  convention  and  let  the  system 
realize  that  convention.  We  assume  that  when  a  reference  to 
the  union  of  two  sets  is  made,  the  system  first  looks  up  the 
set  table  for  the  record  of  this  set,  if  it  is  not  already 
present  there,  it  takes  the  identifier  of  the  referenced  set 
apart,  with  the  character  ”V”  being  the  pivot  character,  and 
interchanges  the  operand  set  identifiers  with  the  character 
"V"  being  in  the  center.  Then  the  system  hashes  with  the 
resulting  identifier  to  the  set  table  and  looks  for  the 
record  of  that  set.  If  there  is  no  record  for  that  set  in 
the  set  table,  it  executes  the  algorithm  given  above. 
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Mow  we  will  do  the  worst  case  asymptotical  time 
complexity  analysis  of  this  algorithm. 

Clearly  the  algorithm  goes  through  both  operand  sets 
once.  Assuming  the  cardinalities  of  the  operand  sets  are 
equal,  the  worst  case  time  complexity  function  of  that 
algorithm  can  be  written  as: 

f  -  2*Kl*n  +  K2*m  +  C 

where: 

n  ■  The  common  cardinality  of  the  operand  sets, 
m  *  The  cardinality  of  the  second  argument  set  ■  n. 
Kl*  The  number  of  memory  references  made  for  each 
record  found  in  the  set  while  proceeding  in  the 
set  in  step  2. 

K2»  The  number  of  memory  references  made  for  each 
record  found  in  the  set  while  proceeding  in  the 
set  in  step  3. 

C  ■  The  number  of  memory  references  made  in  steps  1 
and  4. 

Let: 

K  »  K2  ♦  Kl 

then  the  worst  case  complexity  function  becomes: 
f  -  K*n  ♦  C 

So  clearly,  the  worst  case  asymptotical  time  complexity 
behaviour  of  this  algjrithm  is  0(n). 
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.  Set  Intersection  (R/\S 


The  intersection  of  two  sets  contains  those  members 
which  occur  in  both  operand  sets.  That  can  be  formally 
written  as: 

R/\S  ■  { x | xcR  A  xcS  } 

We  will  use  the  SHT  mechanism  for  this  operation  like  we  did 
in  the  "set  union"  operation.  The  algorithm  is  as  follows: 

1.  Hash  with  the  identifiers  of  the  operand  sets  to 
the  set  table#  find  their  records#  follow  the  pointers  in  the 
PSS  fields  of  those  records  and  find  the  first  records  of  the 
linked  list  structures  of  those  sets. 

2.  Start  from  the  beginning  of  the  first  operand  set 
(order  is  not  important)  and  proceed  down  in  the  linked  list 
of  the  set  record  by  record.  Por  each  record  found  hash  into 
the  SHT  with  the  individual  being  represented  by  that  record 
under  the  second  operand  set's  identifier.  If  this 
individual  also  has  a  record  in  that  set  structure#  hash  to 
the  SHT  with  this  individual  again#  but  this  time  under  the 
new  set's  identifier  "R /\S*  and  establish  the  record  for  that 
individual  in  the  SHT#  if  there  is  no  record  for  that 
individual  in  the  SHT  already.  Link  the  records  of  the 
individuals  created  in  the  SHT  to  each  other  by  their  TASE 
links  as  they  are  created.  Keep  a  cardinality  count 


beginning  with  0  and  for  each  record  created  increment  this 


count*  Mark  the  first  record  of  the  resulting  set  structure 
with  the  pointer  P. 


3.  Hash  to  the  set  table  with  the  identifier  of  the 
resulting  set#  which  is  "R/\Sm,  establish  the  record  of  that 
set  in  the  set  table#  put  pointer  P  into  the  PSS  field  of 
that  record#  and  put  the  last  value  of  the  cardinality  count 
into  the  "CARD"  field  of  that  record. 

Now  we  will  do  the  worst  case  asymptotical  time 
complexity  analysis  of  this  algorithm. 

We  write  the  worst  case  complexity  function  of  that 
algorithm  as: 

f  -  Kl*n  +  C 

where  Kl#  C#  n  and  m  mean  the  same  as  the  corresponding 
parameters  defined  in  the  set  union  operation. 

Let  the  cardinality  of  the  operand  sets  be  equal# 
then  the  complexity  function  becomes: 

f  -  Kl*n  +  C 

So  we  conclude  that  the  algorithm  has  the  worst  case 
asymptotical  time  complexity  behaviour  of  0(n) . 

3.  Set  Difference  (R-S) 

The  difference  of  two  sets#  R  and  S  contains  those 
members  which  are  in  R  but  not  in  S.  This  can  be  written 
formally  as: 

R  -  S  ■  { x | xc r  and  not(xcS)  } 


We  will  use  SHT  for  this  operation  too.  The  algorithm  is  as 
follows: 

1.  Hash  with  the  identifiers  of  the  sets  to  the  set 
table,  find  their  records,  follow  the  pointers  found  in  the 
PSS  fields  of  those  records  and  find  the  first  record  of  each 
set. 

2.  Start  from  the  beginning  of  the  first  operand 
set's  linked  list  (call  it  R).  (*  The  first  operand  set  will 
be  accepted  and  called  the  reference  set  because  the  set 
difference  operation  is  not  commutative.  *)  Proceed  down  in 
the  linked  list  record  by  record.  For  each  record  found, 
hash  with  the  individual  represented  by  that  record  into  the 
SHT  under  the  second  operand  set's  identifier  and  check  if  a 
record  of  this  individual  exists  in  that  set  structure.  If 
so  do  nothing,  else  hash  with  this  individual  to  the  SHT 
under  the  new  set  identifier.  Create  a  record  for  this 
individual  in  the  SHT.  Link  the  records  created  in  this 
manner  to  each  other  by  their  TASE  links  as  they  are  created. 

3.  Hash  to  the  set  table  with  the  new  set  identifier 
"R-S"  and  establish  its  record  and  put  the  pointer  to  the 
beginning  of  new  linked  list  structure  into  the  PSS  field  of 
that  record.  Put  the  last  value  of  the  cardinality-count 
into  the  "size"  field  of  that  record. 

Now  we  will  do  the  worst  case  asymptotical  time 
complexity  analysis  of  this  algorithm. 
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We  write  the  complexity  function  as  follows: 
f  -  Kl*n  +  C 

where  Kl  and  C  are  as  defined  in  the  union  operation  case. 
In  this  case  n  represents  the  cardinality  of  the  first 
operand  set;  the  cardinality  of  the  second  operand  set  does 
not  have  any  affect  on  the  complexity  function  because  we 
make  K  memory  references  for  each  record  of  the  first 
argument  set  in  step  2.  Of  course  this  is  true  in  the  case 
the  second  operand  set  is  already  in  the  SHT.  Since  this 
operation  takes  only  those  sets  that  are  known  by  the  system 
as  operands  and  we  represent  the  sets  that  are  known  by  the 
system  in  the  SHT,  the  mechanism  is  well  defined.  Thus  we 
conclude  that  the  worst  case  asymptotical  time  complexity 
V  •  behaviour  of  that  algorithm  is  0(n). 

E.  INITIAL  MEMBERS  (init:R) 

The  initial  members  of  a  relation  are  the  left  members 
that  are  not  right  members.  This  can  be  stated  formally  as: 

init:R  ■  {  x  |  for  some  y,  xRy  and  not  yRx  } 

Our  "init"  operation  is  supposed  to  take  a  relation 
identifier  as  argument  and  return  a  set  of  individuals  that 
are  the  initial  members  of  the  given  relation. 

The  algorithm  for  "Hash-Incidence-Vector"  representation 
is  given  below: 

1.  Get  the  relation  identifier. 
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2.  Hash  with  that  relation  identifier  into  the  relation 
table;  find  the  record  of  that  relation. 

3.  Follow  the  pointer  found  in  the  PFLM  field  of  that 
record  and  find  the  LET  record  of  the  first  left  member  of 
that  relation. 

4.  Find  the  individual's  identifier  by  following  the 
pointer  found  in  the  PML  field  of  that  record. 

5.  Hash  into  the  RHT  with  the  individual's  identifier 
found  in  step  4  under  the  relation  identifier  in  question. 
If  there  is  no  RHT  record  for  that  individual  in  the  RHT, 
then  hash  with  that  individual  into  the  SHT  under  the  new  set 
identifier  ("init:" (relation  identifier)),  and  establish  its 
record.  If  it  is  the  first  record  established  in  this  manner 
then  mark  it  with  a  pointer. 

6.  Find  the  next  individual's  record  by  following  the 
pointer  found  in  th  TASE  link  field  of  the  current  record  in 
the  LHT;  repeat  steps  4  and  5  for  that  individual. 

7.  Repeat  steps  4,  5  and  6  until  the  LEM  set  of  the 
relation  is  exhausted.  As  the  records  are  created  in  the  SHT 
link  them  to  each  other . 

8.  Establish  the  record  of  the  set  created  above  in  the 
Set  Table  under  the  identifier  "init:R",  where  "R"  is  the 
identifier  of  the  relation  in  question.  Put  the  pointer  P 
(that  was  set  in  step  5)  into  the  PSS  field  of  that  record. 
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Once  we  establish  the  record  of  this  set  in  the  set  table 
any  subsequent  references  made  to  this  operation  take 
constant  time.  Because  we  are  trying  to  find  the  worst  case 
behaviour  we  had  to  write  the  costly  part  of  the  algorithm. 

As  we  can  see,  step  7  of  the  algorithm  causes  the  worst 
case  asymptotical  time  complexity  behaviour  of  the  algorithm 
to  the  0(n),  where  "n"  is  the  cardinality  of  the  LEM  of  the 
relation. 

Now  we  will  do  the  worst  case  asymptotical  time 
complexity  analysis  of  this  algorithm. 

We  define  the  worst  case  time  complexity  function  of  that 
algorithm  as: 

f  -  K*n  +  C 
where: 

n  »  The  cardinality  of  the  LEM  set  of  the  relation. 

K  *  The  number  of  memory  references  made  for  each  record 
of  the  LEM  set  of  the  argument  relation  in  step  5. 

C  *  The  number  of  memory  references  made  in  steps  1,  2, 
3,  4  and  8. 

By  looking  at  the  worst  case  time  complexity  function  we 
conclude  that  the  worst  case  asymptotical  time  complexity 
behaviour  of  this  algorithm  is  0(n). 

Now  we  have  to  define  the  algorithm  that  works  on  the 
table  representation.  Suppose  our  algorithm  begins  to 
examine  the  individuals  on  the  left  column  one  by  one  and  for 


each  one  performs  an  exhaustive  linear  search  in  the  right 
column  to  decide  if  the  individual  in  question  is  also 
present  in  the  right  column.  This  would  obviously  be  an 
order  two,  0(p2)  algorithm,  where  p  is  the  relation  size. 
But  we  may  use  the  SHT  mechanism  again  to  reduce  the  time 
complexity  of  the  algorithm. 

The  steps  of  the  algorithm  are  given  below: 

1.  Start  from  the  beginning  of  the  left  column.  Proceed 
down  in  the  left  column  of  the  table  by  following  the  link 
fields  of  the  table  records  and  by  looking  up  the  individual 
from  the  "left"  field  of  each  table  record.  In  fact  the 
individual  is  not  directly  obtainable  from  the  "left"  field 
because  one  level  of  indirection  is  involved.  That  means  it 
has  to  follow  the  pointer  found  in  the  "left"  field  of  that 
record  in  order  to  find  the  individual.  Por  each  individual 
hash  into  the  SHT  under  the  new  set  identifier  described  in 
step  5  of  the  previous  algorithm,  establish  its  record  and 
link  the  records  as  they  are  created  in  SHT.  If  the  record 
is  the  first  record  created  in  this  manner  mark  that  record 
with  the  pointer  P. 

2.  After  the  left  column  is  exhausted  start  from  the 
beginning  of  the  right  column  and  proceed  down  in  the  right 
column.  Por  each  individual  found  in  the  right  column,  hash 
into  the  SHT  with  that  individual  under  the  new  set's 
identifier.  If  the  record  of  that  individual  is  already 
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present  in  the  SHT,  delete  it.  After  deletion  update  the 
links  between  the  records  created  in  the  SHT  appropriately. 


3.  Establish  the  resulting  set's  record  in  the  set  table 
(as  is  done  in  the  set  operations'  algorithms).  (*  We  did 
not  explain  the  steps  of  that  algorithm  in  detail,  because 
the  steps  are  similar  to  the  steps  of  the  algorithm  defined 
for  the  set  operations.  *) 

This  algorithm  requires  one  exhaustive  linear  search  of 
the  left  column  and  one  exhaustive  search  of  the  right 
column. 

Now  we  will  do  the  worst  case  asymptotical  time 
complexity  analysis  of  this  algorithm. 

The  worst  case  complexity  function  of  that  algorithm  can 
be  written  as: 

f  -  K*p  +  M*p  +  C 

let  N  ■  K  +  M,  then  the  function  becomes: 

f  -  N*p  +  C 
where: 

p  *  Relation  size  (table  size  or  equivalently  the  number 
of  tuples  in  the  relation) . 

n  »  The  cardinality  of  the  LEM  set  of  the  relation. 

N  *  The  number  of  memory  references  made  (averaged)  in 
each  iteration  of  step  1  and  2. 

C  ■  The  number  of  memory  references  made  by  the 
housekeeping  operations. 
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We  know  that  in  the  worst  case  the  relation  nay  be  equal  to 
the  cartesian  product  of  its  LEM  set  and  the  RIM  set,  which 
neana  the  relation  size  is  equal  to  the  product  of  the 
cardinalities  of  the  LEM  set  and  the  RIM  set  of  the  relation. 
So  in  the  worst  case  we  see  that: 

p  »  n*m 
where: 

n  *  The  cardinality  of  LEM  set. 
n  -  The  cardinality  of  RIM  set. 
let  n  *  a,  then: 
p  *  n*n 

If  we  apply  this  result  to  the  above  complexity  function,  it 
is  obvious  that  the  algorithm  has  the  worst  case  asymptotical 
time  complexity  behaviour  of  0(n2). 

F.  RIGHT  RESTRICTION  (R\C) 

It  is  often  useful  to  limit  the  domain  of  a  relation. 
This  operation,  given  a  set  and  a  relation,  restricts  the  RIM 
set  of  the  given  relation  to  the  given  set.  We  can  express 
the  effect  of  this  operation  as  follows: 

R\S  *  {  <x,y> |  <x,y>eR  /\  ycS  } 

So  we  bind  the  domain  of  the  relation  R  to  the  intersection 
of  the  domain  and  the  set  S.  It  is  clear  that  the  operation 
should  extract  those  individuals  which  are  not  in  the  given 
set  from  the  RIM  set  of  the  relation. 


We  will  first  define  the  algorithm  for  the  Hash- 
Incidence-Vector  representation.  The  algorithm  is  as 
follows : 

1.  Get  the  relation  identifier  and  the  set  identifier. 
Pind  their  records  in  the  relation  table  and  in  the  set  table 
respectively  by  hashing  to  those  tables  with  these 
identifiers. 

2.  Pol low  the  pointer  found  in  the  PPLM  field  of  the 
relation's  record  and  find  the  record  of  the  first  left 
member  in  the  LHT.  Begin  from  the  beginning  of  the  linked 
list  structure  of  the  LEM  set  of  the  relation  and  proceed 
down  in  that  linked  list  record  by  record.  Por  each  record 
found,  hash  to  the  LHT  with  the  individual  represented  by 
that  record,  under  the  relation  identifier: 

(Relation  identifier) '\' (set  identifier) 
and  establish  its  LHT  record.  Link  the  records  created  in 
this  manner  to  each  other  as  they  are  created  in  the  LHT. 
(*  This  step  effectively  makes  a  separate  copy  of  the  LEM  set 
of  the  original  relation,  which  becomes  the  LEM  set  of  the 
new  relation.  *) 

3.  Test  if  the  given  set  is  represented  extensionally  or 
intensionally  by  following  the  pointer  in  the  PSS  field  of 
the  set  record.  If  it  is  detected  to  be  extensionally 
represented  do  the  steps  below: 
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a.  Follow  the  pointer  found  in  the  PFRM  field  of  the 


relation's  record.  Start  from  the  beginning  of  the 
relation's  RIM  set;  proceed  in  this  set  record  by  record;  for 
each  record  found#  hash  into  the  SHT  with  the  individual 
being  represented  by  this  record  under  the  argument  set 
identifier.  If  a  record  for  this  individual  is  present  in 
this  set  structure#  make  a  separate  copy  of  the  RIM  set 
record  of  that  individual  in  the  RHT  (like  was  done  for  the 
left  individuals'  records  in  step  2)  under  the  new  relation's 
identifier.  Link  the  records  created  in  this  manner  in  the 
RHT  by  their  TASE  links.  (*  This  step  effectively  copies 
those  RIM  set  records  of  the  original  relation  which 
represent  some  individual  in  the  argument  set#  into  the  new 
RIM  set  of  the  new  relation.  *) 

b.  Hash  to  the  relation  table  under  the  new 
relation's  identifier.  Establish  the  record  of  this  relation 
in  the  relation  table  with  the  new  relation  identifier  being 
in  the  "Rid"  field.  Copy  the  | LEM  |  and  the  BASE  fields  of 
the  original  relation's  record  into  the  | LEM |  and  the  BASE 
fields  of  the  new  relation's  record  respectively.  (*  So  the 
new  relation  makes  use  of  the  original  relation's  incidence 
vector.  *) 

4.  if  the  argument  set  is  detected  to  be  intensionally 
represented  do  the  steps  below: 
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a.  Start  from  the  beginning  of  the  linked  list 
structure  of  the  original  relation's  RIM  set,  proceed  down  in 
the  RIM  set  record  by  record.  For  each  record  found  in  this 
manner  test  if  the  individual  being  represented  by  that 
record  is  a  member  of  the  argument  set.  (*  This  membership 
test  will  be  explained  further  in  the  explanation  of  the 
intensional  representation  structures.  *)  If  this  individual 
is  in  the  argument  set  then  hash  with  that  individual  into 
the  RHT  under  the  new  relation  identifier  and  copy  all  the 
fields  (except  the  TASE  and  Rid  fields)  of  the  RIM  set  record 
(which  belongs  to  the  original  relation)  into  the  new 
record's  corresponding  fields.  Put  the  new  relation's 
identifier  into  the  Rid  field  of  that  record.  Link  the 
records  created  in  this  manner  to  each  other  by  their  TASE 
links  as  they  are  created. 

b.  Do  step  3-b. 

Now  we  will  do  the  worst  case  asymptoticaly  time 
complexity  analysis  of  this  algorithm. 

The  right  restricted  relation  makes  use  of  the  original 
relation's  incidence  vector,  which  significantly  reduces  the 
time  complexity  of  the  resulting  algorithm.  We  expect  that 
most  of  the  time  the  cardinality  of  the  argument  set  will  be 
smaller  than  the  cardinality  of  the  RIM  set  of  the  relation, 
but  of  course  that  may  not  be  true  all  the  time,  i.e.,  we  do 
not  have  a  restriction  on  the  cardinality  of  the  argument 
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set.  In  the  worst  case  the  argument  set  may  be  a  super  set 
of  the  RIM  set  of  the  original  relation,  in  that  case  we  have 
to  copy  all  of  the  RIM  set  of  the  original  relation  in  order 
to  obtain  the  resulting  relation's  RIM  set.  (So  the 
resulting  relation  becomes  exactly  equal  to  the  original 
relation.)  Under  these  considerations  we  write  the  worst 
case  time  complexity  function  of  that  algorithm  as: 

f  *  K*m  +  L*n  +  C 
where: 

m  *  The  cardinality  of  the  LEM  set  of  the  original 
relation. 

n  ■  The  cardinality  of  the  RIM  set  of  the  original 
relation. 

K  *  The  constant  number  of  memory  references  made  while 
copying  each  LEM  set  record. 

L  ■  The  constant  number  of  memory  references  made  while 
copying  each  RIM  set  record. 

C  *  The  constant  number  of  memory  references  made  by  the 
housekeeping  operations. 

In  the  above  function  the  first  term  corresponds  to  step  2, 
the  second  term  corresponds  to  step  3,  and  the  last  term 
corresponds  to  the  other  steps  of  the  algorithm. 


Let  m«n  and  Z-k+l  then  the  complexity  function  becomes: 

f  *  Z*n  +  C 

Thus  it  is  clear  that  the  algorithm  has  the  worst  case 
asymptotical  time  complexity  behaviour  of  0(n) . 

Now  we  have  to  define  the  algorithm  for  the  table 
representation.  The  algorithm  is  as  follows: 

1.  Start  from  the  beginning  of  the  linked  list  structure 
of  the  relation's  table  and  proceed  down  in  the  table  record 
by  record  by  following  the  links  beteen  the  records.  For 
each  record  found  in  this  manner  hash  with  the  individual 
represented  by  the  "right"  field  of  that  table  record  into 
the  SHT  under  the  argument  set  identifier.  If  a  record  of 
that  individual  is  already  present  in  the  SHT  then  create  a 
new  table  record  (that  will  belong  to  the  restricted 
relation).  Copy  the  "right"  and  "left"  fields  of  the 
original  relation's  record  to  the  corresponding  fields  of  the 
new  record.  Link  the  new  table's  records  created  in  the 
above  manner  to  each  other  as  they  are  created. 

The  algorithm  seems  simpler  than  the  previous  one,  but  in 
the  worst  case  we  can  not  say  it  is  less  costly. 

Now  we  will  do  the  worst  case  asymptotical  time 
complexity  analysis  of  this  algorithm. 

In  the  worst  case  the  relation  may  tbe  a  universal 
relation  on  its  LEM  set  and  RIM  set  (note  that  this  is 
different  from  saying  "the  relation  is  the  universal  relation 
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on  its  MEM  set").  This  means  it  contains  all  of  the  tuples 
that  can  be  constructed  from  the  LEM  and  RIM  sets,  and  each 
tuple  of  that  relation  will  have  its  left  component  from  the 
LEM  set  and  it  will  get  its  right  component  from  the  RIM  set. 
This  relation  is  in  fact  the  cartesian  product  of  the  LEM  set 
and  the  RIM  set.  In  this  case  the  number  of  tuples  in  the 
resulting  relation  will  be  equal  to  the  product  of  the 
cardinalities  of  the  LEM  and  the  RIM  set.  By  assuming  that 
the  LEM  set  and  the  RIM  set  of  the  original  relation  have  the 
common  cardinality  "n"  we  write: 
p  •  n*n 

where  "p"  is  the  size  of  the  relation.  Note  that  in  step  1 
we  get  through  the  whole  structure  of  the  argument  set  and  in 
step  2  we  get  through  the  linked  list  structure  of  the 
relation's  table.  In  addition  in  the  worst  case  the  argument 
set  may  be  a  super  set  of  the  RIM  set  of  the  original 
relation;  in  that  case  we  necessarily  copy  the  whole  table  of 
the  relation,  which  means  the  restricted  relation  and  the 
original  relation  become  exactly  equal  to  each  other.  So  we 
write  the  worst  case  time  complexity  function  of  that 
algorithm  as: 
f  »  T*p  +  C 


where: 


p  *  Relation  size. 

T  »  The  constant  number  of  memory  references  made  for 
each  record  of  the  relation  in  step  1. 

C  *  The  constant  number  of  memory  references  made  by  the 
housekeeping  operations. 

In  the  above  function  the  first  terra  corresponds  to  step  1  of 
the  algorithm.  If  we  substitute  (n2)  for  "p"  in  the  above 
function  we  get: 
f  «  T*(n2)  +  C 

So  the  algorithm  has  the  worst  case  asymptotical  time 
complexity  behaviour  of  0(n2),  where  "n"  is  the  common 
cardinality  of  the  LEM  set  and  the  RIM  set  of  the  original 
relation. 

G.  LEFT  RESTRICTION  (C/R) 

It  is  often  useful  to  limit  the  codomain  of  a  relation, 
so  this  operation  takes  a  set  identifier  and  a  relation 
identifier  and  restricts  the  LEM  set  of  that  relation  to  the 
given  set.  This  means  that  after  this  operation  is  performed 
there  remains  only  those  individuals  in  the  LEM  set  of  the 
resulting  relation  which  are  in  the  argument  set.  We  can 
state  this  as  follows: 

S/R  -  {  <y,x>|  <y,x>eR  /\  ycS  } 

The  algorithms  for  "Left  Restriction"  operation  are 
essentially  the  same  as  the  algorithms  for  the  "Right 
Restriction"  operation  for  both  representation  techniques. 
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The  only  difference  is  we  bind  the  LEM  set  in  the  Hash- 
Incidence-Vector  representation  or  left  column  in  the  table 
representation  instead  of  the  RIM  set  in  the  Hash-Incidence- 
Vector  representation  or  the  right  column  in  the  table 
representation.  So  there  is  no  need  to  rewrite  the 
algorithms  and  repeat  the  complexity  analysis. 

H.  RELATIVE  PRODUCT  (RS) 

This  operation  takes  two  relation  identifiers  and 
produces  another  relation  which  is  the  relative  product  of 
the  given  relations  in  the  order  they  have  been  given. 

Thi3  operation  has  an  expensive  algorithm  because  each 
tuple  of  the  resulting  relation  may  originate  from  the 
presence  of  many  different  tuples  in  the  argument  relations. 

The  algorithm  for  Hash- Incidence-Vector  representation  is 
as  follows: 

Let  the  first  argument  relation  be  R  and  the  second 
argument  relation  be  S: 

1.  Find  the  records  of  the  argument  relations  in  the 
relation  table  by  hashing  with  their  identifiers  to  the 
Relation  table  (RT) . 

2.  Make  separate  copies  of  the  LEM  set  of  the  relation 
Rand  the  RIM  set  of  the  relation  S  in  the  LHT  and  in  toe  an? 
respectively  under  the  relation  identifier  *RS".  wr .  -  * 
that  keep  a  LEM  set  index  count  and  for  »a  - 


created,  increment  f.  is  count 
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sot  of  relation  S.  Bach  time  a  LBN  sot  record  is  created, 
put  tho  updated  value  of  this  count  into  the  index  field  of 
this  LBM  set  record.  Establish  the  record  of  the  new 
relation  in  the  relation  table  under  the  relation  identifier 
"RS".  Bstablish  the  pointers  to  the  LBN  set  and  the  RIN  set 
of  that  relation  into  the  P7LN  and  PFRN  fields  of  the  record 
respectively.  Copy  the  lLBN|  field  of  the  relation  R's 
record  into  the  |LBN|  field  of  the  new  record?  in  the  same 
manner»  copy  the  |RIN|  field  of  the  relation  S's  record  into 
the  | RIN |  field  of  the  new  record.  Allocate  a  block  of 
memory  as  large  as: 

( |  LBN  |  *  |  RIN  |  )  /C 

where  C  is  the  memory  word  length.  Put  the  beginning  address 
of  this  block  into  the  "BASE"  field  of  the  new  relation's 
record  i  RT.  Initialize  the  new  incidence  vector.  Bstablish 
the  contents  of  |RIH|  field  of  the  new  relation's  record  in 
the  variable  "CARD”. 

3.  Start  from  the  beginning  of  the  LBN  set  of  the 
relation  RS,  proceed  down  in  the  LBN  set  record  by  record  by 
following  the  TASB  links  between  the  records.  For  each 
record  found  in  that  manner,  extract  the  contents  of  the 
index  field,  put  it  in  variable  "beginl”  and  perform  these 
steps : 
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a.  Find  tha  RIM  set  record  of  the  first  right  member 
of  relation  R  and  proceed  down  in  the  RIM  set  of  the  relation 
R*  by  following  the  TASS  links  between  the  records. 

b.  For  each  tuple  found  which  is  being  represented 
by  the  record  pair  found  in  steps  3  and  3-a,  check  if  the 
relation  R  has  this  tuple  by  hashing  with  the  components 
(individuals)  of  the  tuple  to  the  LHT  and  RHT  and  by  using 
the  reference  algorithm.  If  the  relation  R  does  not  have 
this  tuple  then  do  nothing.  Else/  hash  to  the  LHT  with  the 
individual  found  in  step  3-a  under  the  relation  S.  If  there 
is  no  LEM  set  record  present  for  that  individual  in  the  LEM 
set  of  relation  S  then  again/  do  nothing.  Otherwise/  take 
the  index  stored  in  the  index  field  of  the  LEM  set  record  of 
relation  S  and  put  it  in  variable  ”begin2". 

c .  Take  the : 

begin2  to  begin2  +  CARD 

bits  of  the  incidence  vector  of  relation  S  and  OR  them  with 


the: 


beginl  to  beg ini  +  CARD 


bits  of  the  new  incidence  vector. 

Row  we  will  do  the  worst  case  asymptotical  time 
coaiplexity  analysis  of  this  algorithm. 

The  worst  case  time  complexity  function  of  this  algorithm 
can  be  written  as: 


f  ■  K*n  +  L*a  +  S*n*q*(m/D)  +  R*(n*m)/D  +  C 


The  first  term  of  this  function  corresponds  to  copying  the 
L8M  set  of  the  relation  R  in  the  LHT»  where  *n"  is  the 
cardinality  of  the  L2M  set*  The  second  tern  corresponds  to 
copying  the  RIN  set  of  the  relation  S  in  the  RBT,  where  "a" 
is  the  cardinality  of  the  RIM  set.  The  third  tern 
corresponds  to  step  3,  where  "q"  is  the  cardinality  of  the 
RIM  set  of  relation  Rf  and  constant  0  is  the  nenory  word 
length.  The  tern: 


stands  for  the  nuaber  of  nenory  references  nade  for  each  OR 
operation.  The  fourth  tern  corresponds  to  the  initialisation 
of  the  new  incidence  vector  and  the  last  tern  (constant  C)  is 
the  nuaber  of  nenory  references  nade  by  the  renaining  steps 
of  the  algoritha*  such  as  establishing  the  new  relation’s 
record  in  the  relation  table. 

The  cost  of  fourth  tern  nay  be  reduced  by  pipelining ,  and 
the  cost  of  the  third  tern  can  be  reduced  by  putting  a  large 
portion  of  the  incidence  vector  of  relation  S  into  the  cache 
nenory. 

Let  n*n-q,  W*(S/D),  Y*(R/D)  and  T*K+L,  then  the 

complexity  function  beccaes: 

f  -  W*(n3)  ♦  Y*(n2)  +  T*n  ♦  C 

Obviously  the  algoritha  has  the  worst  case  asyaptotical 
complexity  behaviour  of  0(n3). 
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Now  we  have  to  write  the  algorithm  that  works  on  the 
table  representation.  It  is  natural  to  expect  a  more 
expensive  algorithm  by  the  experience  we  have  had  until  now, 
but  our  task  is  to  find  out  how  expensive  it  is  relative  to 
the  above  algorithm.  The  algorithm  is  as  follows: 

Let  the  first  argument  relation  be  R,  and  the  other  be  S. 

1.  Start  from  the  beginning  of  the  table  of  relation  R; 
for  each  tuple  of  relation  R: 

a.  Look  up  the  right  individual. 

b.  Search  the  left  column  of  the  relation  S  for  that 
individual. 

c.  If  a  tuple  of  relation  S  is  found  to  have  that 
individual  as  the  left  individual,  hash  into  the  SCHT  with 
the  left  individual  of  the  current  tuple  of  relation  R  (i.e., 
the  left  individual  of  the  tuple  from  which  we  get  its  right 
individual  in  step  1-a).  Establish  its  record  in  the  SCHT; 
if  there  is  already  a  record  of  some  left  individual  con¬ 
nected  to  this  hash  table  entry,  look  up  the  individual  being 
represented  by  that  record,  if  it  is  the  same  individual  do 
step  1-d,  else  search  all  the  following  neighbouring  occupied 
hash  table  entries  for  the  record  of  that  individual.  If  it 
can  not  be  found,  establish  the  record  of  that  left  indivi¬ 
dual  and  connect  it  directly  to  the  first  neighbouring  unoc¬ 
cupied  hash  table  entry,  which  follows  the  hash  table  entry 
that  the  hashing  function  first  found.  If  the  record  of  that 


individual  is  found  to  be  connected  to  one  of  the 
neighbouring  occupied  hash  table  entries #  again  do  step  1-d. 

d.  In  any  of  the  above  cases#  either  the  record  of 
the  left  individual  was  found  to  be  already  present  or  was 
established  in  step  1-cr  establish  the  record  of  the  right 
individual  of  the  current  tuple  found  in  S.  Connect  it 
either  to  the  left  individual's  record  or  to  the  end  of  the 
bucket  (if  a  bucket  is  already  connected  to  the  left 
individual's  record).  Continue  to  search  for  the  tuples  in  S 
which  have  the  right  individual  mentioned  in  step  lb  as  their 
left  individual#  and  repeat  the  steps  lc  and  Id  for  these 
tuples. 

2.  Repeat  step  1  until  the  relation  R  is  exhausted.  Set 
a  pointer  to  each  hash  table  entry  occupied  and  put  that 
pointer  into  a  temporary  array  of  type  pointer.  (*  The 
result  of  the  steps  1  and  2  is  the  "adjacency  list" 
representation  of  the  resulting  relation  in  the  SCHT#  now  the 
remaining  steps  are  to  convert  that  representation  to  our 
table  representation.  *) 

3.  Do  step  3  of  the  algorithm  given  for  the  table 
representation  in  the  "union"  operation  for  relations. 

Now  we  will  do  the  worst  case  asymptotical  time 
complexity  analysis  of  this  algorithm. 

The  worst  case  complexity  function  of  that  algorithm  is 
as  follows * 
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f  -  i*(p*p>  +  L*(n*m)  +  c 

The  first  term  of  this  function  corresponds  to  steps  1  and  2 
of  the  algorithm*  where  "p"  is  assumed  to  be  the  size  of  both 
argument  relations.  The  second  term  corresponds  to  the  step 
3  of  the  algorithm  where  "m"  is  the  cardinality  of  the  RIM 
set  of  the  relation  S  and  *n*  is  the  relation  R's 
cardinality.  Me  know  that  in  the  worst  case: 
p  ■  n#n 

where  ”n"  is  the  cardinality  of  both  the  LEM  set  and  the  RIM 
set  of  the  relation  in  question.  Let  m*n*  then  the 
complexity  function  becomes: 
f  *  KMn*)  +  L*(n2)  +  c 

So  we  conclude  that  the  algorithm  has  the  worst  case 
asymptotical  complexity  behaviour  of  0(n*>.  This  is  a  very 
expensive  algorithm.  The  reason  is*  we  have  exhaustively 
searched  the  second  relation's  table  for  each  tuple  of  the 
first  argument  relation. 

I.  SECOND  ANCESTRAL  (san:R) 

This  operation  takes  a  relation  identifier  and  produces 
another  relation  which  is  the  second  ancestral  (transitive 
closure)  of  the  given  relation. 

The  algorithm  for  the  Hash-Incidence-Vector 
representation  makes  use  of  MARSHALL'S  algorithm  for  bit 
matrices.  (Ref.  4]  Marshall's  algorithm  for  incidence 
matrices  can  be  defined  as  follows: 
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Inputs  A  is  the  nXn  incidence  aatrix  of  the  given 
relation#  where  "n"  is  the  cardinality  of  the  MEM  set  of 
the  given  relation.  (*  This  aeans  the  "row  set"  and  the 
"coluan  set"  of  the  incidence  aatrix  are  the  saae  and  are 
the  MEM  set  of  the  relation.  *) 

Outputs  R#  the  transitive  closure  of  A#  also  as  an 
incidence  aatrix  on  the  MEM  set  of  the  given  relation. 

Let  k  represent  the  coluan  nuaber  and  i  represent  the  row 
number.  Let  Rik  denote  the  entry  of  the  incidence  aatrix  at 
row  i  and  coluan  k.  Let  Ri  be  the  i'th  row  of  R  for  0<i<n+l 
and  let  V  denote  the  OR  operation  on  the  rows  of  incidence 
aatrix. 

Algorithm  Transitive  (input#  output) 

R  -  A 

for  k  -  1  to  n  do 

for  i  -  1  to  n  do 

if  Rik  ■  1  then  Ri  -  Ri  V  Rk 
end  do 
end  do 

Note  that  Marshall's  algoritha  is  defined  for  square  aatrixes 
but  that  this  is  not  the  case  for  our  incidence  vector.  Our 
incidence  vector  originates  from  a  different  representation 
of  the  incidence  aatrix  which  is  not  necessarily  a  square 
aatrix.  According  to  our  definition  of  the  incidence  vector# 
given  an  incidence  aatrix#  if  we  convert  it  to  the  incidence 


102 


If] 
I  ,J 


•  V  > 

wr# 


vector  representation,  the  row  set  of  the  incidence  aatrix 


# 


corresponds  to  the  LBN  set  of  the  incidence  vector  and  the 
coluan  set  of  the  incidence  aatrix  corresponds  to  the  RIM  set 
of  the  incidence  vector.  Note  that  the  LBN  set  and  the  RIN 
set  of  a  relation  are  not  necessarily  the  saae. 

In  order  that  the  incidence  vector  of  the  transitive 
closure  of  a  relation  be  different  froa  the  incidence  vector 
of  that  relation,  the  LBN  set  and  the  RIN  set  of  that 
relation  aust  not  be  disjoint,  otherwise  the  transitive 
closure  of  that  relation  has  the  saae  incidence  vector  as  the 
original  relation's  incidence  vector. 

Now  we  have  to  aodify  War  shell's  algor  itha  for  our  case. 
Note  that  our  incidence  vectors  are  aore  efficient  in  storage 
usage  than  the  bit  aatrices  used  by  the  Warshall's  algoritha, 
and  still  Warshall's  algoritha  works  without  an  overhead  in 
tiae. 


The  algoritha  for  Hash-Incidence-Vector  representation  is 


as  follows * 


1.  Find  the  relation's  record  in  the  relation  table  by 
hashing  with  the  given  relation  identifier  to  the  relation 


table. 


2.  Follow  the  PFLN  field  of  that  record  and  find  the 


first  left  aeaber's  record  in  the  LHT. 

3.  Allocate  a  block  of  aeaory  as  large  as: 
( | LEH| * |rin| )/C 


B® 


where  *Ca  is  the  aeaory  word  length,  ■ | LEM | *  is  the 
cardinality  of  the  LEM  set  of  the  given  relation  and  ”|rxm|" 
is  the  cardinality  of  the  RIM  set  of  the  given  relation. 
Record  the  beginning  address  of  that  aeaory  block. 

4.  Start  froa  the  beginning  of  the  LEM  set  of  the  given 
relation.  For  each  LEM  set  record  found  by  following  the 
TASK  links  between  the  records,  hash  with  the  individual 
represented  by  that  record  into  the  RHT  under  the  given 
relation's  identifier. 

a.  If  the  RIM  set  of  that  relation  also  contains 
that  individual,  get  the  index  of  the  LEM  set  record  which 
represents  the  individual  in  question,  and  call  it  "INDEX". 
Copy  INDEX  to  INDBX+I  RIM  | -1  bits  of  the  incidence  vector  of 
the  relation  to  the  corresponding  bits  of  new  incidence 
vector . 

b.  Otherwise  set  a  pointer  to  the  LEM  record  of  that 
individual  and  put  it  into  a  temporary  array  of  type  pointer, 
get  the  index  of  that  record  and  call  it  "INDEX"  and  put 
zeros  into  the  INDEX  to  INDEX+ 1  RIM  | -1  bits  of  the  new 
incidence  vector. 

5.  Repeat  step  4  until  the  LEM  set  of  the  relation  is 
exhausted. 

6.  Start  froa  the  beginning  of  new  incidence  vector  and 
execute  the  loop  below  on  the  new  incidence  vector. 
(*  NOTATION*  In  the  algorithm  segment  below,  "VECTOR [ i , j  ]  " 


••ana  the  incidence  vector  location  corresponding  to  the  LEM 
set  individual  which  is  associated  with  the  index  i  and  the 
RXM  set  individual  which  is  associated  with  index  j,  the 
■VBCTOR[i,j]  to  VBCTOR[i,k]*  means  the  cluster  of  incidence 
vector  entries  (bits)  beginning  with  the  entry  VECTOR! i#j] 
and  ending  with  the  entry  VBCTOR[i#k] .  *) 

for  countl  -  1  to  |lem|*|rxm|  by  |rxm|  do 
for  counts  *  1  to  (RXM{  do 

if  VBCTOR (countl ,  count2]  ■  1  then 

VECTOR tcountl,  1]  to  VBCTOR [countl ,  | RIM ( ]  - 
VECTOR [count 2,  1]  to  VBCT0R[count2 #  |RIH|]  V 
VECTOR [countl#  1]  to  VBCTOR [countl #  |RIM|] 

end  if 
end  do 
end  do 

?.  Start  from  the  beginning  of  the  temporary  pointer 
array  and  find  the  record  of  each  left  member  that  is  not 
present  in  the  RXM  set  of  the  relation  by  following  the 
pointers  in  turn*  For  each  record  found  in  this  manner 
extract  the  index  of  that  record#  call  it  IND8X  and  OR  the 
IJIDBX  to  XMDSX+ 1  RXM  |  *1  bits  of  the  original  incidence  vector 
with  the  corresponding  bits  of  the  new  incidence  vector. 


8.  Make  separate  copies  of  the  LEM  set  and  the  RIM  set 
of  original  relation  in  the  LHT  and  in  the  RHT  respectively 
under  the  new  relation  identifier  "san:R"  as  was  done  in  the 
previous  algorithms. 

9.  Hash  to  the  relation  table  with  the  new  relation's 
identifier  and  establish  its  record.  Put  the  pointers  to  the 
new  LEM  set  and  new  RIM  set  into  the  PFLM  and  the  P7RM  fields 
of  that  record  respectively.  Copy  the  |LEM|  and  |RIM|  fields 
of  the  original  relation's  record  into  the  corresponding 
fields  of  the  new  relation's  record.  Put  the  beginning 
address  of  the  new  incidence  vector  into  the  "base"  field  of 
that  record. 

Now  we  will  do  the  worst  case  asymptotical  complexity 
analysis  of  this  algorithm. 

We  see  that  the  algorithm  is  not  as  costly  as  it  is 
expected  to  be.  The  worst  case  complexity  function  of  this 
algorithm  can  be  written  as: 

f  ■  L*n*(n/C)  +  M*m*n*(n/C)  +  N*n  +  t*m  +  D 
where: 

m  *  The  cardinality  of  the  LEM  set  of  the  given  relation. 

n  *  The  cardinality  of  the  RIM  set  of  the  given  relation. 

C  ■  Memory  word  length. 

L  ■  The  constant  number  of  memory  references  made  while 
copying  or  OR'ing  the  clusters  of  incidence  vector 
bits  for  each  LEM  set  individual  in  steps  4,  5  and  7. 


M  ■  The  constant  number  of  memory  references  made  for 
each  iteration  of  outermost  "for"  loop  in  step  6. 

N  »  The  constant  number  of  memory  references  made  while 
copying  each  RIM  set  individual  in  step  8. 

T  *  The  constant  number  of  memory  references  made  while 
copying  each  LEM  set  record  in  step  8. 

D  ■  The  constant  number  of  memory  references  made  by  the 
remaining  steps. 

In  the  above  function  the  first  term  corresponds  to  steps  4, 
5  and  7,  the  second  term  corresponds  to  step  6,  the  third  and 
fourth  terms  corresponds  to  step  8,  and  the  last  term 
corresponds  to  the  remaining  steps  of  the  algorithm. 

In  writing  the  above  complexity  function  we  assumed  that 
in  the  worst  case  the  temporary  pointer  array  will  be  empty 
because  in  the  worst  case  the  OR  operations  done  in  step  6 
should  be  maximum.  That  means  when  we  first  make  the 
separate  copy  of  the  original  relation's  incidence  vector 
(which  is  the  primitive  form  of  the  new  relation's  incidence 
vector);  in  the  worst  case  it  should  consist  of  all  one's. 

Let  m*n,  L/C  *  H,  H+T  *  S  and  M/C  »  1,  then  the  above 
function  becomes: 

f  *  I*(n3)  ♦  H*(n2)  +  S*n  +  D 

Me  conclude  that  the  worst  case  asymptotical  time  complexity 
behaviour  of  this  algorithm  is  0(n3).  In  fact  the  algorithm 
becomes  an  order  three  algorithm  if  n  >  C»  otherwise  it  can 


be  viewed  as  an  order  two  algorithm#  since  when  n  >  C  we  have 
to  make  (n/C)  memory  references  for  each  OR  operation.  Again 
making  use  of  pipelining  and  cache  memory  is  advantageous  in 
this  case. 

Now  we  will  define  the  algorithm  for  the  table 
representation.  We  will  again  use  SCHT  mechanism  for  this 
algorithm.  The  algorithm  is  as  follows: 

1.  Call  the  algorithm  "mem”/  compute  the  cardinality  of 
the  resulting  MEM  set  of  the  relation  and  record  it. 

2.  Start  from  the  beginning  of  the  given  relation's 
table.  Por  each  right  individual  found  by  following  the 
links  between  the  records  and  by  extracting  the  right  indivi¬ 
dual  represented  by  the  "right”  field  of  each  record#  hash 
into  the  SCHT  with  that  individual#  create  an  SCHT  record  for 
that  individual  and  connect  it  directly  to  the  SCHT  entry 
found.  In  the  case  of  a  collision#  use  the  rehashing 
technique.  Create  a  SCHT  record  for  the  left  individual  of 
the  current  tuple  and  link  the  record  of  the  right  individual 
to  that  record  by  its  collision  link.  (*  Note  that  the  SCHT 
records  corresponding  to  the  left  individuals  are  not 
connected  to  the  SCHT  entries.  *)  If  after  hashing  with  the 
right  individual  it  is  found  out  that  a  record  of  that  right 
individual  is  already  present#  then  add  the  record  of  the 
current  left  individual  to  the  end  of  the  bucket  connected  to 
this  right  individual's  record.  Link  the  right  individuals' 
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records  to  each  other  by  their  TASE  links.  Mark  the 
beginning  of  the  resulting  linked  list  with  the  pointer  P. 
(*  As  a  result  of  step  2,  the  relation  is  represented  in  the 
SCHT  without  repetitions  of  the  right  individuals.  All  the 
SCHT  records  of  the  left  individuals  that  are  related  with  a 
right  individual  have  been  established  in  a  bucket  connected 
to  the  record  of  this  right  individual.  *) 

3.  Por  I  *  1  to  | MEM |  by  1  do 

a.  Start  from  the  beginning  of  the  linked  list 
structure  which  connects  the  right  individuals*  records  in 
the  SCBT.  By  following  the  pointer  P  proceed  in  this  linked 
list  structure  record  by  record.  For  each  record  found  in 
this  manner  do  step  3b. 

b.  Follow  the  pointer  found  in  the  collision  link  of 
the  current  right  individual's  record  and  find  the  left 
individual's  record  (which  is  in  relation  with  the  current 
right  individual) . 

c.  Extract  the  individual  represented  by  the  record 
found  in  step  3b;  hash  with  this  individual  to  the  SCHT.  If 
this  individual  is  represented  by  an  SCHT  record  which  is 
directly  connected  to  the  SCHT  entry  found/  follow  the 
collision  link  of  the  record  found  and  find  the  SCHT  record 
connected  to  this  record. 

d.  Extract  the  individual  represented  by  the  record 
found  in  step  3c.  Search  for  this  record  in  the  bucket 
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connected  to  the  right  individual's  SCHT  record  found  in  step 
3a.  If  this  individual  is  not  represented  by  a  record  in 
this  bucket;  create  a  SCHT  record  for  this  individual  and  add 
it  to  the  end  of  the  bucket. 

e.  Proceed  in  the  bucket  connected  to  the  record 
found  in  step  3c  and  for  each  record  found  do  step  3c. 

f.  After  the  bucket  is  exhausted  do  steps  3c  through 
3e  for  the  next  record  of  the  bucket  connected  to  the  SCHT 
record  found  in  step  3b.  (*  After  the  execution  of  the  above 
steps,  the  transitive  closure  of  the  given  relation  appears 
in  the  SCHT  in  the  adjacency  list  representation.  Next  it 
has  to  be  converted  to  the  table  representation.  *) 

4.  Construct  the  table  representation  of  the  transitive 
closure  of  the  relation  by  looking  at  the  arrangement  of  the 
records  of  the  individuals  in  SCHT  (as  was  done  in  the  step  5 
of  the  algorithm  given  for  the  complement  operation  on 
relations) . 

Now  we  will  do  the  worst  case  asymptotical  time 
complexity  analysis  of  this  algorithm. 

Before  going  into  the  detail  of  the  complexity  function, 
we  have  to  define  what  would  be  the  worst  case.  In  the  worst 
case  the  relation  is  a  universal  relation  on  its  MEM  set.  In 
that  case  the  buckets  constructed  in  SCHT  have  a  common 
length,  which  is  equal  to  the  cardinality  of  the  members  set 
(MEM)  of  the  relation.  Let  this  length  be  ”n”,  which  means 
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the  cardinalities  of  tha  MSN*  RIM  and  LEM  sets  of  the 
relation  are  all  equal  to  n.  So  in  step  3  we  sake  a  number 
of  memory  references  proportional  to  n  for  each  record  found 
in  step  3e  and  we  make  a  number  of  memory  references 
proportional  to  the  square  of  n  in  step  3f  by  repeating  step 
3e  n  times.  In  the  same  way  we  make  a  number  of  memory 
references  proportional  to  the  cube  of  n  in  step  3a  by 
repeating  step  3f  n  times  for  each  right  member  of  the 
relation.  Since  the  step  3  also  Iterates  n  times;  in  step  3 
we  make  a  number  of  memory  references  proportional  to  (n*). 
So  we  write  the  worst  case  complexity  function  of  this 
algorithm  as  follows: 

f  -  K*(n*)  +  L*(n2)  ♦  o 

where  the  first  term  corresponds  to  step  3,  the  second  term 
corresponds  to  step  1,  2  and  4,  and  the  last  term  corresponds 
to  the  number  of  memory  references  made  by  the  housekeeping 
operations.  The  constant  K  represents  the  constant  number  of 
memory  references  made  for  each  iteration  of  step  3.  The 
constant  L  represents  the  averaged  number  of  memory 
references  made  for  each  tuple  of  the  original  and/or  the 
resulting  relation  (whichever  is  larger)  in  step  1*  2  and  4. 

As  can  be  seen  the  algorithm  is  a  very  costly  algorithm. 
The  high  cost  of  this  algorithm  is  caused  by: 

1.  Repeated  execution  of  relative  product  operation  on 
the  relation  itself  and  on  each  intermediate  relation  found. 


2.  Removing  duplicate  tuples  free  the  intermediate 


relations. 

This  algorithm  utilises  the  SCHT  mechanism  for  removing  the 
duplicate  tuples  from  the  intermediate  relations. 

Suppose  we  defined  the  algorithm  in  a  straight  forward 
manner)  i.e.,  the  algorithm  obtains  the  transitive  closure  by 
first  getting  the  relative  product  of  the  relation  with 
itself  (It?)  and  taking  the  union  of  the  resulting  relation 
and  the  original  relation  to  obtain  the  next  intermediate 
relation,  and  so  on.  If  we  do  not  remove  duplicate  tuples 
from  the  intermediate  relations,  in  the  worst  case  defined 
above  each  intermediate  relation's  size  becomes  two  times 
greater  than  the  previous  intermediate  relation's  size,  and 
the  algorithm  becomes  an  0(2n)  algorithm  automatically.  On 
the  other  hand  if  we  remove  duplicate  tuples  from  each  inter¬ 
mediate  relation  without  using  SCHT  mechanism,  in  the  worst 
case  defined  above  an  operation  of  this  kind  has  an  0(p2) 
algorithm  (where  "p”  is  the  relation  size).  If  we  accept  the 
cardinality  of  the  MSM  set  of  the  relation  as  a  measure,  the 
algorithm  may  be  viewed  as  0(n<)  algorithm.  He  can  define  an 
o(n5)  algorithm  for  the  second  technique  described  above. 
Because  for  each  step  of  that  algorithm  we  will  have  to 
execute  the  algorithm  that  removes  duplicate  tuples,  the 
resulting  algorithm  becomes  an  0(n9)  algorithm,  so  our 
previous  algorithm  can  be  viewed  as  a  relatively  efficient 
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E  ^vV  algorithm  for  obtaining  tha  transitive  closure  on  the  table 
|  representation#  but  it  is  so  expensive  that  it  is  not 

feasible  to  implement  it  at  all. 
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Zn  this  study  we  will  tcy  to  find  out  efficient  ways  to 
represent  intermediate  relations  and  sets  that  result  from 
relational  operations  and  set  operations.  Earlier  when  we 
focused  on  the  extensional  representation  techniques ,  we 
explicitly  constructed  the  representation  structures  for  the 
intermediate  relations  and  sets  in  the  memory.  What  we  will 
try  to  do  now  isf  not  to  represent  intermediate  relations  and 
sets  in  the  memory  explicitly  and  still  be  able  to  execute 
the  relational  expressions. 

Zn  order  to  do  the  relational  operations  and  set 
operations  without  explicitly  representing  the  relations  or 
sets  resulting  from  those  operations,  it  requires  the 
complete  establishment  of  the  relational  language's  syntax. 
After  the  syntax  is  established  we  can  decide  on  the  suitable 
compiler  or  interpreter  design  and  we  can  compile  or 
interpret  the  source  expression  so  that  the  code  which  does 
the  relational  operations  and  set  operations  without 
constructing  the  extensional  representation  structure  of  the 
resulting  relation  or  set  can  be  produced  and/or  executed. 
Because  the  syntax  of  the  language  is  subject  to  changes,  we 
will  not  go  into  the  compiling  or  interpreting  issues  in 


detail;  instead  we  will  define  the  algorithms  for  the  code 
that  does  the  relational  operations  and  set  operations 
intensionally. 

Mow  we  have  to  define  what  is  the  basic  schema  we  have  in 
mind.  We  assume  that  the  relational  language  expression  is 
fully  parenthesised  or  there  exists  a  default  convention 
(left  to  right/right  to  left)  which  causes  the  expression  to 
be  parsed  as  if  it  is  parenthesised  by  using  this  convention. 
The  recursive  descent  parser  parses  the  expression  and  for 
each  pair  of  operators  found  in  this  manner  calls  the 
appropriate  code  that  we  will  give  the  algorithm  for.  So  we 
directly  execute  the  source  expression  while  parsing. 
Example x 

Suppose  the  parser  is  to  parse  the  expression  below , 

((R-S)  |  (TSU) ) tx 

The  scanner  finds  the  *|”  and  ”x”  operators  first,  and  passes 
the  tokens  to  the  parser.  The  parser  then  calls  the  routine 
defined  for  the  ”|"  and  ”x”  operator  pair.  This  routine 
directs  the  scanner  for  finding  the  operands  and  the 
argument,  then  the  scanner  returns  the  token  for  the  operator 
detected  in  the  first  operand  relation  (R-S)  which  is  the 
token  for  The  routine  attaches  the  appropriate  tokens 
and  argument  to  this  token  using  the  argument  given  to  it  and 
calls  the  routine  defined  for  the  resulting  operator  pair  (In 
this  case  the  routine  associated  with  the  and  "x" 
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operator  pair  ia  invoked  with  the  argument  being  "x"  and  the 
operand  relations  being  the  R  and  S).  If  this  routine  does 
not  return  a  valid  individual,  the  saae  thing  is  done  for  the 
second  operand  (composite)  relation.  If  an  individual  is 
obtained  by  calling  any  of  those  routines,  this  individual  is 
either  returned  to  the  caller  or  sent  to  the  output. 

Now  we  have  to  explain  this  mechanism  in  general.  We  can 
view  the  parser  as  the  collection  of  routines.  After  the 
first  operator  pair  is  found  the  related  routine  is  called, 
and  this  routine  gets  the  tokens  it  needs  by  directing  the 
scanner}  it  attaches  the  appropriate  tokens  to  those  tokens 
and  calls  the  routines  related  to  the  resulting  operator 
pairs  with  the  appropriate  arguments  possibly  obtained  from 
its  own  argument.  Bach  routine  called  may  do  the  same  thing 
by  directing  the  scanner  to  get  additional  tokens  and 
relation  identifiers  from  the  source  code.  This  process 
continues  until  the  primitive  operations  can  be  done  on  the 
extensionally  or  intensionally  represented  relations  and 
sets,  then  each  routine  returns  the  result  to  its  caller,  the 
caller  performs  the  necessary  evaluations  on  the  results  that 
are  obtained  by  calling  other  routines,  and  returns  the 
result  obtained  to  its  caller.  This  process  continues  until 
the  final  result  is  sent  to  the  output. 
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As  can  bs  se«n  in  th«  above  example,  each  routine  knows 
what  it  is  doing  and  mechanically  does  its  job  by  calling 
other  routines  defined  for  other  operator  pairs.  Note  that  a 
routine  may  call  itself  again  in  some  depth  of  calling  chain. 
This  process  stops  when  the  primitive  operations  can  be 
executed  on  the  extensionally  represented  relations  and  sets. 
In  fact,  some  primitive  operations  can  also  be  executed  on 
the  intensionally  represented  relations,  which  will  be 
discussed  later. 

So  in  summary,  the  process  defined  in  the  above  example 
is  aimed  at  reducing  the  initial  compound  relational 
expression  into  easily  manageable  primitive  relational 
operations  and  the  membership  tests  on  the  extensionally  or 
intensionally  represented  relations  and  sets.  Note  that  the 
above  process  is  just  a  particular  case  of  the  use  of  the 
algorithms  that  we  will  define;  these  algorithms  can  be 
adapted  to  the  other  implementation  schemas.  So  in  defining 
our  algorithms  we  will  not  go  into  the  detail  of  the 
implementation  technique  we  proposed. 

Because  there  are  no  less  than  136  possible  permutation 
of  operator  pairs,  if  we  create  a  tiny  routine  for  each  case, 
we  may  cause  trashing.  In  fact  we  do  not  have  to  introduce 
that  many  routines  into  the  system;  first  of  all  most  of  the 
136  routines  do  very  simple,  well  defined  reductions.  In 
addition  the  interpreter  we  have  in  mind  directly  executes 


the  source  code.  Under  those  considerations  we  may 
preprocess  the  source  code  by  directly  making  simple 
reductions  on  the  source  code.  This  mechanism  simply  takes  a 
part  of  the  source  code  and  replaces  this  part  with  a  reduced 
expression  whenever  it  finds  a  part  of  the  source  code  which 
is  reducable  with  the  rules  in  hand.  So  executing  this  text 
substitution  mechanism  separately*  leaves  us  less  work  in 
interpreting  the  source  code.  But  this  is  not  enough  to  make 
the  system  simple  and  compact.  So  we  take  advantage  of  the 
regularities  in  the  operations  and  generalize  some  operations 
so  that  we  do  not  have  to  define  all  possible  operator  pairs 
involving  that  operation.  That  leaves  us  a  total  62  special 
cases  or  in  other  words  operator  pairs  to  be  specially 
treated  or  requiring  an  algorithm. 

The  second  issue  in  that  kind  of  mechanism  is*  to 
generate  the  individuals  of  an  intensionally  represented  set 
one  at  a  time  as  they  are  requested.  This  becomes  necessary 
in  some  relational  operations  that  require  the  individuals  of 
the  argument  set  in  order  to  accomplish  their  job.  In  some 
cases  we  want  to  learn  only*  if  a  given  individual  is  in  an 
intensionally  represented  set  or  not.  In  this  case  we  can  do 
membership  test  with  less  cost*  although  some  configurations 
of  composite*  intensionally  represented  sets  still  forces  us 
to  produce  the  individuals  of  that  intensionally  represented 
sets  explicitly  in  order  to  do  the  membership  test.  Thus  in 
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some  operations r  (like  the  "Image*  operation)  we  will  have  to 
define  algor ithm  for  both  the  membership  test  and  the 
production  of  the  individuals  of  the  sets  produced. 

In  defining  that  mechanism#  first  we  wrote  down  all  the 
possible  operator  pairs#  we  separated  out  the  ones  that  can 
be  handled  in  the  preprocessing  phase#  then  we  generalised 
some  of  the  operations  and  treated  the  reaaining  operator 
pairs  as  special  cases. 

In  the  following  sections  we  will  explain  some  concepts# 
such  as  generalisation  of  operations  and  the  structure  of  the 
system#  further. 

A.  PREPROCESSING 

The  complete  listing  of  preprocessing  rules  is  given  in 
Appendix  C.  In  this  section  we  will  explain  how  the 
mechanism  works  by  giving  examples. 

The  preprocessing  mechanism  scans  the  source  code  and# 
whenever  it  finds  a  pattern  matching  one  of  the  rules  that  it 
knows  about#  makes  the  necessary  modifications  to  the  source 
code.  To  make  this  clear  we  will  give  an  example.  Example: 

Suppose  our  source  code  contains  the  segment  of  code 
below: 

((RS)  -  (final:T) )c 

The  preprocessing  mechanism  finds  the  operators  (by  counting 
the  parenthesis)  "ca  (converse)  and  "-"#  and  the  rule  defined 


for  this  case  is  applied.  In  this  case  the  applicable  rule 
is: 

(R-S)  c  ■  Rc  -  Sc 

So  the  preprocessing  mechanise  makes  a  separate  copy  of  the 
character  string,  " (RS) ",  concatenates  the  character  ”c”  to 
it,  takes  the  character  string  identifying  the  second  operand 
relation  which  is  "(finalsT)”,  and  concatanates  it  with  the 
character  "ca.  The  preprocessor  inserts  the  character 
between  the  two  character  strings  obtained  and  attaches  the 
parenthesis  to  both  ends  of  the  resulting  character  string. 
Hence  the  resulting  character  string  replacing  the  original 
segment  of  source  code  becomes: 

( (RS) c  -  (final :T)c) 

So,  as  can  be  seen  the  mechanism  is  fairly  mechanized  and 
simple.  All  the  preprocessing  mechanism  has  to  do  is  to 
count  the  number  of  paranthesis  and  find  the  operator  pairs 
for  which  the  preprocessing  rules  are  defined.  Then  it  makes 
the  necessary  modifications  in  the  source  code.  Note  that 
the  preprocessing  program  has  almost  no  intelligence;  on  the 
contrary,  it  does  the  reduction  mechanically. 

We  want  to  emphasize  that  the  preprocessing  program  does 
not  work  only  once  and  can  be  implemented  so  that  it  makes 
more  than  one  pass  on  the  source  code  in  order  to  apply 
preprocessing  rules  further  if  the  first  pass  yields  a  source 
code  that  can  be  further  preprocessed,  or  it  may  be  executed 


on  a  segment  of  source  code  upon  request  of  any  routine  of 
the  interpreter. 

B.  GENERALIZATION  OF  OPERATORS 

We  can  divide  the  relational  operators  into  two  groups: 

1.  The  operators  that  construct  the  new  relations  from 

the  operand  relations. 

2.  The  operators  that  construct  sets. 

We  list  the  operators  of  the  first  kind  as  follows: 

a.  R6S  (Relation  intersection) . 

b.  R|S  (Relation  union). 

c.  R-S  (Relation  difference) . 

d.  Rc  (Converse  of  a  relation) . 

e.  non:R  (Complement  of  a  relation). 

f.  R | | S  (Parallel  application). 

g.  R*S  (Dual  application). 

h.  fan:R  (First  ancestral  of  a  relation) . 

i.  san:R  (Second  ancestral  of  a  relation). 

j.  R::G  (Meta  application). 

k.  R:x  (Function  ? pplication) . 

l.  r\c  (Relation  right  restricted  to  the  set  C) . 

m.  C/R  (Relation  left  restricted  to  the  set  C.) 

n.  C/R\C  (Relation  restricted  to  the  set  C) . 

o.  final :R  (Final  members  of  relation  R) . 

In  the  above  listing  we  can  select  any  of  the  composite 
relations  and  substitute  another  composite  relation  into  any 
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operand  and/or  argument  of  it.  In  this  manner  we  can  combine 
any  number  of  relations.  As  an  example  we  will  make  a  few 
substitutions  in  order  to  obtain  a  new  composite  relation  out 
of  the  other  composite  relations.  Suppose  we  selected  item 
(i)  which  is  ”san:R”  and  substituted  the  relation  "R&S"  into 
the  argument  R,  we  obtain: 
san: (R&S) 

and  suppose  we  further  substituted  "C/R"  into  the  R  and 
”R||S"  into  the  S  in  the  above  composite  relation.  We 
obtain: 

san: ( (C/R) & (R| | S) ) 

Now  we  have  to  list  down  the  relational  operations  that 
construct  s“ts.  These  operations  are  given  below: 

a.  R:x  (Function  application) . 

b.  lem:R  (Left  members  set  of  a  relation) . 

c.  rim:R  (Right  members  set  of  relation  R) . 

d.  R!:x  (Image). 

e.  unimg:R  (Unit  image). 

f.  unimg':R  (Unit  coimage) . 

g.  init:R  (Initial  members  of  relation  R) . 

Definition:  A  composite  set  is  a  kind  of  set  which  is 

expressed  in  terms  of  composite  relations  and  relational 
operations. 

We  can  substitute  any  composite  relation  into  the  R  in 
the  above  cases.  (Arguments  shown  as  ”x”  means  that  the 
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argument  is  individual).  Note  that  the  "Function 
Application”  operation  is  included  in  both  cases,  because  it 
may  produce  an  individual  which  can  be  a  set,  a  relation,  an 
integer,  a  real,  a  character  string  or  a  bit  string.  We  can 
combine  the  above  composite  sets  by  using  the  set  operations 
"Union",  "intersection"  and  "difference".  By  attaching  the 
negation  sign  in  front  of  each  of  the  above  composite  sets  we 
get  the  complements  of  those  sets  which  are  again  viewed  as 
composite  sets.  The  user-defined  (primitive)  extensionally 
or  intensionally  represented  sets  can  participate  in  the  set 
operations  with  the  composite  sets.  So  we  can  define  an 
infinite  number  of  composite  sets. 

As  we  mentioned  earlier,  considering  each  of  the  possible 
pair  of  operators  causes  the  system  to  be  very  complex,  so  we 
generalize  the  operations  of  the  first  kind  in  order  to 
recover  from  doing  that.  In  order  to  make  this  concept  clear 
we  will  take  a  specific  case  and  explain  what  we  mean  by 
generalization . 

As  a  specific  composite  relation  let's  take  the  relation 
"fan:R".  We  can  substitute  14  other  relational  operations 
(except  meta  application)  into  the  R  including  "fan:R” 
itself.  Because  each  of  the  composite  relations  constructed 
in  this  manner  can  participate  in  the  operations  of  the 
second  kind  or  the  first  kind,  we  would  be  unable  to  cover 
all  the  cases  that  may  be  defined  by  the  programmer.  So  in 
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the  generalization  of  "fansR"  we  will  define  an  algorithm  for 
■fan:R"  for  each  of  the  operations  of  the  second  kind.  Por 
example  we  will  define  an  algorithm  for  the  operation, 
(fan:R)lx  which  is  constructed  by  using  1  and  fan  operations. 
The  !  operation  is  the  kind  of  operation  that  constructs  a 
composite  set  and  the  fan  operation  is  the  kind  of  operation 
which  constructs  the  composite  relation.  That  means  the  ! 
operation  is  of  the  second  kind  and  fan  operation  is  of  the 
first  kind.  These  algorithms  will  be  defined  in  terms  of  the 
operations  of  the  second  kind  working  on  the  operand  relation 
R.  So  an  algorithm  defined  for  the  operator  pair  <fan,!>  or 
in  other  words  for  the  operation  (fan:R)l:  applies  the 
operations  of  the  second  kind  to  the  relation  substituted  for 
"R".  Because  the  algorithms  of  these  operations  are  also 
defined  for  this  relation,  the  i  (image)  operation  on  "fan:R* 
can  be  done  with  no  confusion.  Note  that  the  relation 
substituted  for  ”R*  may  be  another  composite  relation 
constructed  from  the  operations  of  the  first  kind  and  the 
same  rule  applies  to  this  relation  as  it  was  in  the  relation 
•fan:R"  case.  This  rule  is  applied  until  the  operations  of 
the  second  kind  can  be  done  on  the  extensionally  or 
intensionally  represented  user  defined  relation.  Example: 

Suppose  we  have  the  relational  expression: 

(fan: (san:R) ) ! :x 
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The  prim*  operators  in  this  expression  are  the  left-most  fan 
operator  and  the  1  operator.  In  this  case  the  algorithm 
defined  for  (fan:R)l:  operation  is  invoked.  Suppose  further , 
this  algorithm  uses  unimg  operation  (which  is  of  the  second 
kind)  on  the  operand  relation  (which  is  the  santR)  so  the 
operation  to  be  performed  is  unimg:  (san:R)  and  the  system  has 
an  algorithm  defined  for  the  <unimg,san>  operator  pair.  Note 
that  the  algorithm  defined  for  the  <fan,l>  operator  pair  does 
not  care  about  the  kind  of  composite  operand  relation  and  no 
matter  how  complex  this  operand  relation  is,  it  simply 
applies  the  operations  of  the  second  kind  to  the  operand 
relation  (in  this  case  santR)  in  order  to  do  its  job.  That 
is  what  we  mean  by  generalization. 

C.  THE  ALGORITHMS  POR  GENERATING  THE  INDIVIDUALS  OP 

COMPOSITE  SETS 

In  this  chapter  we  will  explain  in  detail  how  we  produce 
the  individuals  of  the  intensionally  represented  intermediate 
sets.  In  addition  to  that  we  may  want  to  test  if  a  given 
individual  is  in  an  intensionally  represented  set.  As  long 
as  we  are  able  to  produce  all  the  individuals  in  intensional 
sets,  checking  for  membership  is  trivial.  But  producing  all 
the  individuals  in  these  kinds  of  sets  is  a  costly  operation 
and  in  some  cases  we  can  do  the  membership  test  with  less 
cost.  On  the  other  hand,  we  can  not  restrict  ourselves  to 
only  membership  tests  in  order  to  work  around  the  costly 


production  aechanisa  because  soae  of  the  relational 
operations  expect  each  individual  of  a  given  arguaent  set  in 
order  to  accoaplish  its  job.  We  will  need  to  produce  the 
individuals  of  intensionally  represented  interaediate  sets 
especially  in  the  "1*  operation  and  in  the  operations  on  the 
coapleaent  of  a  relation. 

In  the  production  of  the  individuals  of  an  intensionally 
represented  set#  defining  one  priaitive  function  is  very 
useful.  We  will  call  this  priaitive  function#  "Force".  When 
we  apply  this  function  to  an  intensionally  represented  set 
for  the  first  tiae#  it' returns  the  first  individual  of  this 
set.  Repeated  applications  of  this  function  to  this  set  will 
return  the  second#  the  third  individuals  in  turn#  and  so  on. 
Thus  the  function  aust  set  up  break  points  between  the 
production  of  the  individuals  of  the  intensionally 
represented  sets.  At  those  break  points  we  exaaine  and 
evaluate  the  individual  returned  or  check  if  it  satisfies 
soae  condition.  Another  objective  of  the  "Force”  priaitive 
is  that#  if  we  are  doing  the  aeabership  test  by  producing  the 
individuals  of  an  intensionally  represented  set  by  coaparing 
the  given  individual  with  each  individual  produced#  as  soon 
as  we  find  a  aatch#  we  can  quit  producing  the  individuals# 
thus  saving  ourselves  froa  producing  all  the  individuals  of 
the  intensionally  represented  set  in  question. 


Implementation  of  the  "Force"  primitive  is  fairly 
complex.  We  will  show  that  there  exists  at  least  one  way  to 
implement  the  "Force"  primitive.  Our  force  primitive*  in 
fact  does  everything  in  the  production  of  the  individuals  of 
an  intensionally  represented  intermediate  set. 

In  the  discussion  of  the  types  of  the  relational 
operations*  we  defined  two  types  of  relational  operations  and 
we  stated  that*  we  can  combine  the  two  kinds  of  relational 
operations  to  obtain  composite  sets.  In  addition  we  can  use 
the  set  operations  as  the  resulting  composite  sets  being  the 
operand  sets  in  order  to  obtain  other  composite  sets.  We  can 
easily  define  the  role  of  the  set  operations  in  the 
production  of  the  individuals  in  general  (i.e.*  we  can 
generalize  the  set  operations).  That  leaves  us  all  the 
distinct  configurations  of  composite  sets  that  can  be  created 
from  the  operations  of  the  first  kind  and  the  operations  of 
the  second  kind  by  substituting  the  operations  of  the  first 
kind  in  the  argument  and/or  operand  relation  (R)  of  the 
operations  of  the  second  kind.  Because  "init:R"  is  defined 
in  terms  of  "lem:R”  and  "rim:R"*  and  we  will  define  the 
algorithms  involving  the  function  application  operation  in 
Chapter  B*  in  this  chapter  we  will  only  focus  on  the 
operations  below: 
lem:R 
rim:R 
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R1  tC 
unimgtR 
unlag ' :R 

which  arc  of  tha  second  kind.  Zn  tha  aaaa  manner,  wa  will 
generalise  tha  operations  "Right  restriction",  "Left 
restrict ion”  and  "Restriction"  which  are  of  tha  first  kind. 
As  before,  tha  "Function  application”  operation  returns  a 
composite  relation  of  tha  first  kind  so  there  is  also  no  need 
to  treat  this  operation  as  a  special  case.  Those  reductions 
leave  us  the  remaining  operations  of  the  first  kind  which  are 
to  be  combined  with  the  operations  of  the  second  kind. 

We  will  define  the  "Force"  primitive  as  a  recursive 
function  which  includes  less  than  30  cases  defined  for  the 
distinct  permutations  of  the  first  and  second  kind  of 
operations.  Because  we  preprocess  the  source  code  many  of 
the  permutations  are  reduced  to  the  other  permutations  that 
we  will  be  defining  the  algorithms  for.  The  "Force”  function 
includes  a  big  case  statement  in  which  each  case  refers  to  a 
particular  permutation  (operator  pair)  that  will  be  treated 
specifically.  It  takes  the  character  string  that  identifies 
the  composite  set,  and  extracts  the  operator  pair  of  this 
composite  set.  It  identifies  the  argument  (if  there  is  any) 
and  invokes  its  appropriate  case  in  the  case  statement.  This 
•agment  of  casa  statamant  calls  the  function  "Force” 
recursively  by  subdividing  the  original  composite  sat  and 
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expressing  it  in  terms  of  operations  of  the  second  kind  on 
the  operand  relations/relation.  That  means  the  particular 
case  of  the  case  statement  mechanically  finds  the  operand 
relations  in  the  composite  set  expression,  creates  other 
composite  sets  by  using  those  relations  and  the  argument  (if 
there  is  any)  and  forces  the  composite  sets  by  calling  the 
function  "Force”  recursively  with  those  composite  sets  as  the 
arguments.  This  recursion  continues  until  an  operation  of 
the  second  kind  can  be  performed  on  an  extensionally 
represented  relation  or  set.  Mote  that  "Force”  has  to 
recognize  when  it  finds  a  primitive  extensionally  represented 
relation  or  set.  That  can  be  done  by  hashing  into  the 
relation  table  or  the  set  table  and  by  checking  if  a  record 
of  the  relation/set  exists  in  the  relation- table/set- table. 

We  stated  that  the  "Force”  function  returns  one 
individual  at  a  time.  This  feature  makes  the  program 
extremely  complex.  Mow  we  will  explain  how  that  mechanism 
works.  We  know  that  we  get  the  individual  to  be  returned 
from  an  extensionally  represented  relation  or  set  and  this  is 
the  stopping  condition  for  our  recursion.  Further,  after 
returning  an  individual,  we  have  to  remember  where  we  left  in 
order  to  respond  to  a  subsequent  reference  to  the  function 
"Force”  related  to  the  same  argument  composite  set  in  the 
same  context.  In  fact  memorizing  where  we  left  off  is 
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necessary  at  a  very  low  level,  i*e.,  in  the  operations  on  the 
extensionally  represented  relations  and  sets.  Example: 

Suppose  we  are  forcing  the  composite  set,  ria:(R-S)  where 
R  and  S  are  assumed  to  be  primitive,  extensionally 
represented  relations.  According  to  the  algorithm  defined 
for  the  operator  pair,  <rim,->  (which  will  be  given  later)  we 
first  force  lem:R-lem:S  and  get  the  first  individual  of 
lem:R-lem:S  say  *y".  We  force  the  composite  set  unimg':R:y 
next,  so  we  get  the  first  individual  and  return  it.  Suppose 
now  the  composite  set  rim:(R-S)  is  forced  again.  The 
individual  to  be  returned  is  the  next  individual  of  the 
composite  set  unimg':R:y.  So  we  have  to  remember  where  we 
left  off  in  the  unimg':R  operation. 

In  this  example,  we  can  solve  this  recognition  problem  by 
setting  a  global  pointer  to  the  record  of  the  next  individual 
to  be  returned  in  the  RIN  set  of  R.  Obviously  this  does  not 
solve  all  the  problems,  but  we  have  to  observe  one  fact  here: 
Suppose  our  original  composite  set  has  been  a  part  of  another 
composite  set  and  the  first  individual  we  produced  had  to  be 
tested  against  the  unimg':R:y  (composite  set)  in  another 
context.  That  means  we  are  in  the  situation  that  we  have  two 
same  operations  (unimg':R:y)  being  used  for  different 
purposes  in  the  same  expression.  An  example  for  this  case  is 
given  below: 

(unimg' :R:y)/G! : (unimg' :R:y) 
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In  execution  of  this  sxpcssslon  the  individual  obtained  by 
forcing  tha  coaposita  sat: 

Gt  t (uniag'  xRxy) 

is  tastad  if  it  is  in  tha  coaposita  sat  (which  is  on  tha  laft 
of  G)  uniag':R:y  or  not  (as  a  rasult  of  rastriction 
oparation).  So  tha  uniag' :R:y  operation  is  parforaad  for  two 
different  purposes  and  wa  parfora  each  oparation  while  wa 
still  save  tha  state  of  tha  other.  Tha  second  reference  is 
faced  with  tha  strange  fact  that,  even  though  it  is  unrelated 
with  forcing  tha  (uniag' iRs)  oparation  for  obtaining  tha 
second  alaaant  of  tha  coaposita  sat  uniag' xRxy  it  obtains  tha 
second  individual  of  tha  right-aost  coaposita  sat  uniag' xRxy, 
instead  of  tha  first  individual  of  tha  left-aost  coaposita 
sat  uniag'tRxy. 

Based  on  tha  above  facts  wa  have  defined  tha  aechanisa 
below  in  order  to  aake  this  work. 

Definition!  A  "high  level  coaposita  sat"  is  a  coaposita 
sat  which  is  either  defined  in  tha  source  coda  or  created  by 
an  internal  aechanisa  other  than  tha  function  "Force”.  Tha 
aechanisa  has  tha  below  structural 

1.  Naaorizing  tha  state  is  dona  in  these  operations 
whan  they  work  on  tha  extensionally  represented  relations: 

a.  Inage  oparation. 

b.  uniag  oparation. 

c.  uniag'  oparation. 


d.  left  members  operation. 

e.  right  meaU>ers  operation. 

2.  Bach  high  level  composite  set  is  associated  with  a 
global  count#  when  it  is  forced  for  the  first  time#  and  a 
system-wide  association  table  is  maintained  which  relates  the 
high  level  composite  sets  with  their  global  variables.  Each 
global  count  associated  with  a  high  level  composite  set 
begins  from  a  biased  value  so  that  the  values  of  the  global 
counts  associated  with  various  high  level  composite  sets  are 
restricted  to  the  particular  intervals.* 

3.  The  system  has  a  global  hash  table#  namely  MRASH 
table  (Memory  Hash  Table)  which  is  used  to  save  various 
pointers. 

4.  In  each  level#  when  a  composite  set  is  created,  an 
integer  taken  from  the  global  count  associated  with  the  high 
level  composite  set  (from  which  this  sub-composite  set  is 
originated)  is  saved.  When  a  pointer  is  set  to  a  record  at 
the  lowest  level#  this  pointer  is  saved  in  the  MHASH  table 
with  the  current  integer  being  the  identifier  with  which  we 
hash  into  the  MHASH  table.  (This  allows  us  to  save  different 
pointers  for  the  same  operation  and  the  same  extensional 
relation  or  set.  Hence  we  properly  distinguish  between  the 

‘Because  the  system  will  have  a  limit  on  the  number  of 
high  level  composite  sets  this  interval  can  be  defined  in  the 
implementation  phase. 


same  operations  that  request  the  same  things  from  the  same 
extensional  relations  but  have  different  originations). 

5.  When  a  new  composite  set  is  created*  the  pointer  to 
the  character  string  representing  this  composite  set  is  saved 
in  the  MHASH  table  with  the  current  integer  being  the 
identifier.  (This  allows  us  to  avoid  creating  the  same  sub- 
composite  sets  redundantly  in  the  subsequent  execution  of  the 
"Force"  function  related  to  the  same  high  level  composite  set 
and  originated  from  the  same  context). 

6.  If  there  is  no  individual  remaining  to  be' returned  in 
a  primitive  relation  or  a  set*  we  associate  0  with  the 
current  integer  in  the  MHASH  table*  instead  of  a  pointer*  and 
we  return  "nil"  to  the  caller  (which  is  the  function  "Force” 
in  itself). 

7.  After  each  force  operation  on  a  high  level  composite 
set*  the  global  count  associated  with  this  high  level 
composite  set  is  reset  to  0. 

So*  suppose  we  forced  a  composite  set*  and  we  got  the 
first  individual  of  this  composite  set;  if  we  force  this  set 
again*  the  function  goes  and  mates  the  same  calls  and  goes 
through  the  same  kind  of  counting  mechanism,  when  it  hits  a 
primitive  relation*  it  hashes  with  the  current  integer  into 
the  hash  table  and  looks  up  the  pointer.  If  it  is  not  zero* 
it  returns  the  next  individual  obtained  from  the  LBM  set  or 
the  RIM  set  of  the  primitive  relation  or  the  extensionally 


represented  primitive  set,  by  advancing  the  pointer 
identified  by  this  integer#  and  re-establishes  this  pointer 
in  the  hash  table  with  the  same  integer  being  the  identifier. 
If  there  is  no  pointer  but  0#  it  returns  "nil”  to  the  caller# 
in  which  case  the  caller  (the  function  "Force"  itself) 
proceeds  to  find  the  next  primitive  composite  set  to  continue 
to  produce  its  individuals#  if  any  remain.  (Note  that  we  are 
always  mentioning  the  function  "Force"  because  it  calls 
itself  recursively  with  different  arguments). 

So  when  we  force  a  composite  set  repeatedly  the  function 
"Force"  remembers  its  last  state  and  returns  the  next 
individual  of  the  composite  set  without  any  confusion. 
Because  the  calling  chain  has  to  reoccur  and  the  counting 
mechanism  always  follows  the  same  procedure#  we  do  not  get 
integers  different  from  the  ones  established  in  the  first 
pass. 

So  this  mechanism  uses  random  coding  principles#  and  the 
hashing  is  done  through  a  counting  process. 

Now  we  will  define  the  algorithms  for  all  possible  cases 
that  should  be  included  in  the  big  case  statement  of  the 
function  "Force".  As  we  explained  before#  many  kinds  of 
composite  sets  can  be  reduced  to  the  other  types  of  composite 
sets  or  primitive  ones  in  the  preprocessing  phase.  By 
primitive  we  mean  that  the  relations  and  sets  from  which  the 
composite  set  is  constructed  are  represented  extensionally. 


So  our  algorithms  will  refer  to  the  kind  of  composite  sets 
that  should  be  treated  specifically. 

In  these  algorithms  we  will  try  to  emphasize  the  fact 
that  the  individuals  are  returned  one  at  a  tiraef  but  in  some 
cases  it  is  more  descriptive,  if  we  explain  an  algorithm  as 
if  it  is  being  forced  repeatedly.  In  addition  we  will  not 
repeatedly  mention  the  state  saving  mechanism  which  saves  the 
pointers  in  the  MHASH  table,  with  the  identifiers  being  the 
integers  taken  from  the  global  count.  So  the  reader  should 
always  think  that  .pa  the  individuals  are  returned  one  at  a 
time  and  the  state  saving  mechanism  works  as  it  is  supposed 
to  do. 

In  these  algorithms  we  will  use  the  LHT  as  the  SCHT 
whenever  it  is  possible  to  produce  the  same  individuals 
repeatedly  as  a  result  of  forcing  a  composite  set  repeatedly. 
Hence  at  any  point  in  execution  we  remember  which  of  the 
individuals  of  the  composite  set  being  forced  have  been 
produced  as  a  result  of  previous  force  operations,  and  we  do 
not  produce  them  again.  This  mechanism  is  also  useful  in  the 
intermediate  operations  done  in  some  algorithms  and  reduces 
the  time  complexity  of  some  algorithms  in  some  cases. 

In  this  mechanism  we  construct  a  set  in  the  LHT  which 
contains  the  LHT  records  of  the  individuals  that  we  want  to 
save  temporarily.  After  no  forcing  operation  can  be  made  to 
a  composite  set  we  may  return  the  linked  list  structures  of 
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all  sets  constructed  in  the  LHT  for  this  composite  set  to  a 
storage  pool  (which  may  be  implemented  as  a  stack) ,  so  that 
the  subsequent  record  allocations  can  be  made  from  the 
storage  pool;  if  the  storage  pool  is  empty  then  the  records 
are  allocated  from  the  heap.  If  the  storage  pool  size 
exceeds  some  previously  defined  limit,  we  dispose  some  number 
of  records  from  the  storage  pool.  As  a  consequence,  this 
kind  of  mechanism  does  not  use  up  a  lot  of  memory  resources 
and  we  recover  from  doing  our  operations  redundantly. 

In  this  mechanism  we  will  again  use  the  integers  taken 
from  the  global  count  as  identifiers;  because  we  are 
distinguishing  the  sets  constructed  in  the  LHT  with  the 
relation  identifiers,  these  integers  will  be  treated  as  if 
they  are  relation  identifiers.  In  order  to  return  the  linked 
list  structures  of  these  sets  to  the  storage  pool,  a  separate 
table  should  be  maintained  that  includes  the  identifier  of 
the  set  (integer)  and  its  origination  (i.e.,  which  composite 
set  being  forced  caused  that  set  to  be  created).* 

Suppose  we  did  not  define  this  mechanism,  and  for  each 
individual  produced  by  forcing  a  composite  set  we  did  some 
complex  operation.  Then  if  the  same  individual  is  produced 

*The  reader  should  not  confuse  these  two  usages  of  the 
integers  as  identifiers.  In  summary,  we  use  the  integers 
taken  from  the  global  count  for  saving  the  state  of  the 
•Force"  function  and  for  designating  the  intermediate, 
temporary  sets  in  the  LHT.  As  long  as  we  allocate  the  same 
integers  at  the  same  points  each  time  a  high  level  composite 
set  is  forced,  no  problem  arises. 


two  times*  we  wi 11  do  this  complex  operation  redundantly  when 
the  individual  is  produced  for  the  second  time. 

Image  operation  on  the  extensionally  represented  relation  : 

We  will  need  to  modify  the  algorithm  for  image  operation 
that  we  defined  previously  because  in  this  case  it  has  to 
produce  the  individuals  of  the  resulting  set  one  at  a  time. 

The  new  algorithm  for  "Image”  operation  makes  use  of  the 
"Unit  image”  operation  for  which  we  will  be  defining  the 
algorithm  for  later.  The  algorithm  is  as  follows: 

1.  Force  the  argument  set  C. 

2.  Get  the  individual  returned  say  ”x”. 

3.  Force  the  primitive  composite  set  unimg:R:x;  take  the 
c*  individual  returned*  say  "y”. 

4.  Take  the  next  integer  from  the  global  county  hash 
with  this  individual  into  the  LHT  with  the  integer  obtained 
above  being  the  relation  identifier.  If  this  individual  has 
a  record  in  the  LHT  under  this  relation  identifier  go  to  step 
3  in  order  to  continue  with  the  next  individual  of  the 
composite  set  unimg:R:x,  otherwise  do  step  5. 

5.  Establish  the  record  of  this  individual  in  the  LHT 
under  the  relation  identifier  (integer)  obtained  in  step  4; 
return  this  individual  as  the  result. 

Now  we  will  do  the  worst  case  asymptotical  time 
complexity  analysis  of  this  algorithm. 
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As  can  be  seen  we  executed  the  "Unit  image"  algorithm  for 
each  individual  of  the  set  C.  So  we  made  a  number  of  memory 
references  proportional  to  ”n"  for  each  individual  of  the  set 
C,  where  ”n"  is  the  cardinality  of  the  LEM  set  of  the 
extensionally  represented  relation.  Suppose  the  cardinality 
of  the  set  C  is  equal  to  "n",  then  the  time  complexity 
behaviour  of  the  algorithm  is  0(n2).  But  suppose  the 
cardinality  of  the  set  C  is  equal  to  the  square  of  ”n".  This 
time  the  time  complexity  behaviour  of  this  algorithm  becomes 
0(n3).  on  the  other  hand  the  cardinality  of  the  set  C  may 
even  be  1  (singlteon  set);  in  that  case  the  time  complexity 
behaviour  of  the  algorithm  becomes  0(n),  which  is  the  same  as 
the  asymptotical  time  complexity  of  the  "unimg”  algorithm 
that  will  be  given  later.  So  we  conclude  that  the  time 
complexity  behaviour  of  the  algorithm  is  strongly  dependent 
on  the  cardinality  of  the  argument  set. 

(R  -  S) IsC 

The  algorithm  for  this  case  is  as  follows: 

1.  Force  the  set  C. 

2.  Get  the  individual  returned.  Call  this  individual 

■x" . 

3.  Force  unimg:R:x;  get  the  individual  returned.  Call 
this  individual  "y".  Get  the  next  integer  from  the  global 
county  hash  into  the  LHT  with  this  integer  being  the  relation 
identifier*  check  if  there  is  a  record  for  "y”  in  the  LHT 
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under  this  relation  identifier,  and  if  so  do  step  3-a,  else 
do  step  3-b. 

a.  Repeat  step  3  in  order  to  continue  with  the  next 
individual  of  the  uniag:R:x. 

b.  Force  the  uniag':S:y  repeatedly  and  get  the 
individuals  one  at  a  tiae.  If  any  of  those  individuals  is 
the  saae  as  ”x",  or  if  ”y”  is  not  in  the  lea:S  (i.e.,  the 
(uniag':S:)  operation  is  not  applicable)  quit  forcing  the 
uniag'*S:y  and  go  to  step  3  in  order  to  continue  with  the 
next  individual  of  the  coaposite  set  uniag:R:x.  Rise  do  step 
3-c  • 

c.  Establish  the  record  of  ”y"  in  the  LHT  with  the 
integer  obtained  in  step  3  being  the  relation  identifier  and 
return  "y*. 

4.  If  as  a  result  of  the  subsequent  force  operations  no 
individuals  of  the  uniag:R:x  reaains  to  be  returned,  force 
the  set  C  in  order  to  get  the  next  eleaent  of  it  and  continue 
froa  step  2  as  the  algoritha  is  forced  further. 

Mow  we  will  do  the  worst  case  asyaptotical  tiae 
coaplexity  analysis  of  this  algoritha. 

Assuaing  we  will  force  the  above  algorithm  until  no 
individual  of  the  set  (R-S)i:C  reaains  to  be  produced,  and 
assuaing  in  the  worst  case  the  relations  R  and  S  are 
disjoint,  we  aake  the  nuaber  of  aeaory  references 
proportional  to  "n"  for  each  individual  of  the  set  C,  and  for 
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each  "y*  obtained  in  this  manner ,  because  of  the  relations 
are  disjoint/  we  make  the  number  of  memory  references 
proportional  to  ”n"  where  "n"  is  assumed  to  be  the 
cardinality  of  the  both  lemtR  and  rim:S.  Let's  assume  the 
cardinality  of  the  argument  set  is  also  "n”/  so  it  is  clear 
that  we  are  making  a  number  of  memory  references  proportional 
to  the  cube  of  ”n".  This  leads  us  to  the  fact  that/  if  the 
operand  relations  are  represented  extensionally  the  algorithm 
behaves  as  an  0(n3)  algorithm;  if  the  operand  relations  are 
composite  relations  the  cost  of  this  algorithm  increases 
significantly  depending  on  the  cost  of  the  "unimg"  and 
■unimg'"  operations  on  those  composite  relations. 

(R#S) 1 :C 

The  algorithm  for  this  case  is  as  follows: 

1.  Force  the  set  C. 

2.  Get  the  individuals  of  the  set  C  one  at  a  time. 

3.  For  each  individual  found  in  the  above  manner  (which 
is  necessarily  a  pair/  otherwise  the  operation  is 
undefined)/  extract  the  left  individual  of  this  pair/  apply 
the  relation  R  to  this  individual/  and  in  the  same  manner 
apply  the  relation  S  to  the  right  individual  of  the  pair.  If 
both  application  operations  return  individuals/  construct  a 
pair  relation  with  these  individuals  and  return  the  pointer 
to  the  record  of  this  pair. 


Now  we  will  do  the  worst  case  asymptotical  time 
complexity  analysis  of  this  algorithm. 

The  algorithm  has  the  same  worst  case  asmptotical 
complexity  behaviour  as  the  algorithm  defined  for  the 
composite  set  (R#S)ltC  and  the  same  argument  applies. 

(RS) !:C 

The  algorithm  for  this  case  can  be  defined  as  follows: 

1.  Force  the  set  C. 

2.  Get  the  individual  returned  and  call  it  ”x". 

3.  Force  unimg:S:x]  Get  the  individual  returned  and  call 
it  "y". 

4.  Take  the  next  integer  from  the  global  count  and  hash 
into  the  LHT  with  this  integer  being  the  relation  identifier 
and  check  if  "y*  has  a  record  in  the  LHT  under  this  relation 
identifier,  if  so  go  to  step  3  in  order  to  continue  with  the 
next  individual  of  the  unimg:S:x  or  if  no  individual  of  the 
unimg:S:x  remains  to  be  produced  go  to  step  1  in  order  to 
continue  with  the  next  individual  of  the  argument  set  C. 
Otherwise  do  step  5. 

5.  Establish  the  record  of  ”y”  in  the  LHT  with  the 
relation  identifier  being  the  integer  taken  from  the  global 
count  in  step  4,  and  force  unimg:R:y.  Take  the  individual 
returned  and  do  step  6. 

6.  Take  the  next  integer  from  the  global  count,  hash  into 
the  LHT  with  this  integer  being  the  relation  identifier. 


check  if  there  exists  a  record  for  this  individual  in  the  LHT 


*  ■*-  - 


with  this  integer  being  the  relation  identifier.  If  so  go  to 
step  5  in  order  to  continue  with  the  next  individual  of  the 
unimg:R:y,  and  if  no  individual  remains  to  be  returned  from 
unimg:R:y,  go  to  step  4  in  order  to  continue  with  the  next 
individual  of  the  unimg:S:x  or  the  argument  set  C.  Otherwise 
establish  the  record  of  the  individual  obtained  from 
unimg:R:y  in  the  LHT  with  the  relation  identifier  being  the 
integer  taken  from  the  global  count  in  step  6  and  return  this 
individual. 

How  we  will  do  the  worst  case  asymptotical  time 
complexity  analysis  of  this  algorithm. 

Suppose  the  relations  R  and  S  are  extensionally 
represented  relations.  If  we  force  the  above  algorithm 
repeatedly  until  no  individual  remains  to  be  produced  by 
forcing  unimg:S:x  for  each  individual  "x”  of  the  set  C,  we 
make  a  number  of  memory  references  proportional  to  the  square 
of  V,  where  "n*  is  assumed  to  be  the  cardinality  of  both 
the  argument  set  C  and  the  lem:S.  In  the  worst  case  we 
produce  the  complete  LZM  set  of  the  relation  S  without  any 
repetition  of  the  individuals,  because  we  save  the 
individuals  that  have  been  produced  until  now  by  establishing 
each  individual's  record  in  the  LHT.  So  in  the  second  part 
(step  5)  of  the  algorithm  we  force  the  unimg:R:y,  at  most  "n" 
times,  for  each  of  the  "n”  y's,  where  ”n”  is  the  cardinality 
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of  the  lem:S.  So  in  that  part  of  the  algorithm  we  make  a 
number  of  memory  references  proportional  to  the  square  of 
"n",  where  the  cardinality  of  the  lem:R  is  also  assumed  to  be 
Mn”.  Under  those  conditions  the  execution  of  the  above 
algorithm  is  effectively  the  same  as  the  execution  of  two 
0(n2)  algorithms  sequentially.  That  means  our  algorithm  has 
the  worst  case  asymptotical  time  complexity  behavior  of 
0(n2) . 

unimg:R:x  (Where  R  is  Represented  Extensionally) 

This  can  refer  to  the  "unimg”  operation  on  an 
extensionally  represented  relation.  Note  that  in  this  case  R 
should  not  be  considered  as  a  composite  relation.  The 
algorithm  for  this  operation  is  essentially  the  same  as  the 
algorithm  that  we  defined  in  the  extensional  representation 
analysis,  but  in  this  case  we  are  not  producing  all  the 
individuals  of  the  resulting  set  at  once.  Instead,  the  first 
time  this  operation  is  forced  we  find  the  first  individual 
that  belongs  to  the  resulting  set  and  we  advance  the  pointer 
to  the  next  individual's  record  in  the  LEM  set  of  the 
relation  (if  there  is  one).  Then  we  take  the  next  integer 
from  the  global  count,  and  we  establish  the  pointer  in  the 
hash  table  with  this  integer  being  the  identifier.  We  then 
return  the  individual  we  found  to  the  caller.  If  we 
exhausted  the  LEM  set  of  the  relation  after  repeated  force 
operations,  we  save  0  in  the  MHASH  table  using  the  same 


143 


integer  as  the  identifier.  So  the  aain  difference  is  we 
return  the  individuals  of  the  resulting  set  one  at  a  time  as 
this  coaposite  set  is  being  forced. 

Now  we  will  do  the  worst  case  asyaptotical  tiae 
coaiplexity  analysis  of  this  algoritha. 

The  algoritha  has  the  saae  asyaptotical  tiae 
coaplexity  behaviour  as  the  algoritha  that  we  defined  in  the 
extensional  representation  analysis  for  this  operation#  but 
in  that  case  it  is  forced  repeatedly  until  no  individual 
reaains  to  be  produced, 
uniag:  (RS)  :x 

The  algoritha  for  this  case  can  be  defined  as  follows: 

1.  Force  the  uniag :S:x. 

2.  Get  the  individual  returned  and  call  this  individual 

■y" . 

3.  Force  the  uniag:R:y;  get  the  individual  returned; 
take  the  next  integer  f roa  the  global  count  and  hash  into  the 
LHT  with  the  individual  in  question  under  the  relation 
identifier  (Integer)  obtained  above.  If  this  individual  does 
not  have  a  record  in  this  set#  or  if  this  set  does  not  exist# 
establish  the  record  of  this  individual  in  the  LHT  with  the 
relation  identifier  being  the  integer  obtained  above  and 
return  this  individual  as  the  result.  Else#  if  this  set 
already  exists  (resulting  froa  previous  executions  of  this 
algoritha)  and  if  this  individual  is  represented  by  a  record 


in  this  set#  quit  with  this  individual  and  continue  with  the 
next  individual  of  the  composite  set  unimg:R:y.  In  the  case 
no  individual  of  this  composite  set  remains  to  be  produced  go 
to  step  1  in  order  to  continue  with  the  next  individual  of 
the  composite  set  unimg:R:x.  (*This  step  prevents  us  from 
producing  the  same  individuals  repeatedly*).  Mote  that  when 
we  force  this  composite  set  a  second  time  we  get  the  next 
individual  to  be  returned  by  the  force  operation  on  unimg:R:y 
if  there  remains  an  individual  to  be  returned.  Suppose  there 
is  no  individual  that  remains  to  be  returned.  The  algorithm 
continues  with  the  next  individual  to  be  returned  by  the 
force  operation  on  the  composite  set  unimg:S:x  and  repeats 
steps  2#  3  and  4. 

How  we  will  do  the  worst  case  asymptotical  time 
complexity  analysis  of  this  algorithm. 

As  can  be  seen#  for  each  individual  obtained  by  forcing 
the  unimgtSxx  we  force  the  unimg:R:y#  where  ”y”  is  the 
individual  obtained  by  forcing  the  unimg:S:x.  Assuming  the 
relations  R  and  S  are  extensionally  represented  relations  and 
assuming  we  force  the  composite  set#  unimg:(RS):x  until  no 
individuals  remain  to  be  produced #  we  make  a  number  of  memory 
references  proportional  to  the  square  of  *n%  where  ”n”  is 
assumed  to  be  the  common  cardinality  of  the  LEM  set  of  R  and 
the  LEM  set  of  S.  So  the  algorithm  has  the  worst  case 
asymptotical  time  complexity  behaviour  of  0(n2). 


lm:R  (Where  R  is  Represented  Extsnsionally) 

This  csss  csfsrs  to  ths  "Is*:"  operation  on  an 
extsnsionally  represented  relation.  The  algorithm  for  this 
case  is  as  follows: 

1.  Hash  to  the  relation  table  with  the  relation 
identifier:  find  the  record  of  the  relation. 

2.  Follow  the  PFZM  field  of  this  record  and  find  the 
record  of  the  first  LEM  set  individual. 

3.  Advance  the  pointer  to  the  next  record;  take  the 
next  integer  frosi  the  global  count  and  save  this  pointer  in 
the  MHASH  table  with  this  integer  as  the  identifier. 

4.  if  you  are  forced  again,  take  the  next  integer  from 
the  global  count,  hash  with  this  integer  into  the  MHASH 
table,  take  the  pointer  stored  with  this  integer  as  the 
identifier,  decrement  the  global  count,  and  repeat  step  3. 

As  can  be  seen  the  algor  itha  returns  one  individual  at  a 
tiae  as  it  is  forced. 

Mow  we  will  do  the  worst  case  asymptotical  tiae 
complexity  analysis  of  this  algorithm. 

In  the  worst  case  the  algorithm  is  forced  until  no 
individual  remains  to  be  returned  froa  the  LEM  set  of  the 
relation.  This  effectively  corresponds  to  tracing  through 
the  LEM  set  of  the  relation.  That  means  we  sake  a  number  of 
memory  references  proportional  to  V,  where  ”n”  is  the 
cardinality  of  the  LEM  set  of  the  relation.  So  we  conclude 


that  tha  algorithm  has  the  worst  case  asymptotical  time 
complexity  behaviour  of  0(n). 
lera: (R-S) 

The  algorithm  for  this  case  is  as  follows: 

1.  Force  the  rim:R. 

2.  Get  the  individual  returned  and  call  this  individual 

"y", 

3.  Force  the  unimg:Rsy. 

4.  Get  the  individual  returned  and  call  it  ”z”. 

5.  Force  the  unimg'tStz 

6.  Get  the  individual  returned#  if  this  individual  is 
the  same  as  "y",  go  to  step  3  in  order  to  continue  with  the 
next  *z".  If  no  individual  remains  to  be  produced  from  the 
set  unimg:R:y  then  go  to  step  1  in  order  to  continue  with  the 
next  "y”  (i.e.»  get  prepared  for  continuing  with  the  next 
individual  of  the  rim:R  in  the  case  the  algorithm  is  forced 
subsequently).  Otherwise  do  step  7. 

7.  Take  the  next  integer  from  the  global  count;  hash 
into  the  L8T  with  this  integer  being  the  relation  identifier. 
Check  if  the  individual  to  be  returned  (current  ”y”)  has  a 
record  in  the  LHT  under  this  identifier.  If  so  go  to  step  3 
or  if  no  individual  of  the  composite  set  unimg:R:y  remains  to 
be  produced  in  step  3#  go  to  step  5  and  execute  step  5  for 
the  current  z.  Else  establish  the  record  of  this  individual 


I 


& 


p  it* 


I 


with  the  integer  taken  from  the  global  count  being  the 
relation  identifier  and  return  this  individual. 

In  the  above  algorithm  getting  prepared  for  producing  the 
next  individual  of  the  set  rim:R  is  no  more  than  erasing  the 
pointers  saved  for  the  unimg:R:y  and  the  unimg':S:z 
operations  in  the  MHASH  table.  So  a  subsequent  forcing 
operation  will  find  that  there  does  not  exist  any  pointer 
saved  for  the  unimg:R:y  operation  and  will  automatically 
force  the  set  rim:R  in  order  to  get  the  next  individual 
following  the  "y".  Mote  that  if  a  subsequent  force  occurs^ 
and  if  there  remains  individuals  in  the  set  unimg':S:z,  the 
algorithm  begins  from  step  4,  and  produces  the  next 
individual  of  the  unimg':S:z  which  is  to  be  tested  against 
the  particular  "y"  in  question.  If  no  individual  remains  to 
be  returned  from  the  set  unimg':S:z,  then  the  algorithm 
begins  from  step  3,  and  produces  the  next  ”y"  and  so  on. 

Mow  we  will  do  the  worst  case  asymptotical  time 
complexity  analysis  of  this  algorithm. 

In  the  worst  case  the  relations  R  and  S  may  be  disjoint 
and  in  this  case*  assuming  we  force  the  composite  set  lem:(R- 
S)  until  no  individual  remains  to  be  returned,  the  structure 
of  the  algorithm  effectively  becomes  similar  to  the  three 
nested  "for”  loops.  Assuming  the  cardinalities  of  the  rim:R, 
lemtR  and  the  rim:S  are  all  equal  to  n,  the  operation  makes  a 
number  of  memory  references  proportional  to  the  cube  of  nr 
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because  we  are  making  a  number  of  memory  references 
proportional  to  "n"  for  each  individual  of  the  composite  set 
rim:R.  By  forcing  the  unimg:R:y  for  each  individual  obtained 
from  this  composite  set  we  are  making  a  number  of  memory 
references  proportional  to  "n"  by  forcing  the  composite  set 
unimg':S:z.  So  we  conclude  that  the  worst  case  asymptotical 
time  complexity  behaviour  of  this  algorithm  is  0(n3). 
lem(RfS) 

The  algorithm  for  this  operation  can  be  defined  as 
follows : 

1.  Force  the  rim:R;  get  the  individual  returned. 

2.  Apply  R  and  S  to  this  individual.  If  both 
applications  return  individuals  construct  a  pair  relation  out 
of  the  individuals  returned.  Else  go  to  step  1. 

3.  Return  the  pointer  to  the  record  of  that  pair 
relation  which  is  established  in  the  relation  table. 

4.  Do  the  same  as  above  in  the  subsequent  force 
operations  on  this  algorithm  by  producing  the  individuals  of 
the  rim:R  one  at  a  time. 

In  the  above  algorithm  the  application  operation  on  S  may 
not  always  return  an  individual,  because  not  all  individuals 
of  the  rim:R  are  necessarily  in  the  rim:S.  In  fact  we  should 
have  produced  the  individuals  of  the  intersection  of  the 
operand  relations'  RIM  sets  in  step  1,  but  this  would  be  a 
very  costly  operation  by  requiring  the  individuals  of  the  RIM 
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set  o£  either  or  both  of  the  relation  R  and  relation  S  to  be 
saved  temporarily.  So  we  used  the  application  operation's 
filtering  property  and  we  only  produced  the  RIM  set  of 
relation  R.  Given  an  individual  in  the  rim:R  if  this 
individual  is  net  in  the  riit:S,  application  operation  on  S 
fails  to  produce  an  individual  as  a  result,  because,  we  need 
two  individuals  in  order  to  construct  a  pair  relation.  In 
this  case  our  algorithm  continues  with  the  next  individual  of 
the  composite  set  rim:R. 

Now  we  will  do  the  worst  case  asymptotical  time 
complexity  analysis  of  this  algorithm. 

In  this  algorithm,  assuming  R  and  S  are  extensionally 
represented  relations,  we  make  a  constant  number  of  memory 
references  for  each  individual  of  the  rim:R.  Because  we  have 
to  produce  each  individual  of  the  rim:R  in  order  to  produce 
all  the  LEM  set  individuals  of  the  relation  (RiS) ,  we  make  a 
number  of  memory  references  proportional  with  ■n",  where  "n" 
is  the  cardinality  of  the  RIM  set  of  R.  So  we  conclude  that 
the  worst  case  asymptotical  time  complexity  behaviour  of  this 
algorithm  is  0(n). 
lem: (R | | S) 

When  we  constructed  the  extensional  representation 
structure  of  the  relation  ”R||S”  earlier,  we  created  all  the 
possible  ordered  pairs  that  can  be  created  from  the  LEM  set 
individuals  of  the  operand  relations  in  order  to  construct 


the  LEM  set  of  this  relation.  So  our  algorithm  does  the  same 
thing  by  constructing  those  pairs  one  at  a  time.  The 
algorithm  is  as  follows: 

1.  Force  the  lem:R;  get  the  individual  returned  and  call 
it  "x". 

2.  Force  the  lem:S;  get  the  individual  returned  and  call 
it  "y*. 

3.  Construct  a  pair  relation  with  "x*  being  the  left 
member  and  "y*  being  the  right  member. 

4.  Return  the  pointer  to  the  record  of  that  pair  which 
is  established  in  the  relation  table. 

5.  In  the  subsequent  force  operations*  continue  to 
produce  the  individuals  of  the  lem:S  and  pair  each  individual 
returned  with  "x".  Return  the  pointers  to  the  pairs  one  at  a 
time. 

6.  When  the  lem:S  is  exhausted,  produce  the  next 
individual  of  the  lem:R  and  repeat  the  above  steps  for  this 
individual  as  you  are  forced. 

Now  we  will  do  the  worst  case  asymptotical  time 
complexity  analysis  of  this  algorithm. 

Assuming  the  relations  R  and  S  are  extens ionally 
represented  relations,  and  the  algorithm  is  forced 
exhaustively,  we  produce  all  the  individuals  of  the  lem:S  for 
each  individual  of  the  lem:R,  so  if  the  cardinalities  of  the 
lem:R  and  the  lem:S  are  both  equal  to  V,  we  make  the  number 
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That 


of  memory  references  proportional  to  the  square  of  ”n”. 
means  the  algorithm  has  the  worst  case  time  complexity 
behaviour  of  Ofn?). 
lem: (RS) 

The  algorithm  for  this  case  can  be  defined  as  follows: 

1.  Force  the  lem:S;  get  the  individual  returned.  Call 
this  individual  *x". 

2.  Force  the  unimg:R:x;  get  the  individual  returned; 
hash  into  the  LHT  with  the  current  integer  (taken  from  the 
global  count)  being  the  relation  identifier;  check  if  there 
is  a  record  for  this  individual  under  this  relation 
identifier;  if  so  do  step  2-a,  else  do  step  2b. 

a.  Repeat  step  2  by  forcing  the  unimg:R:x  again  and 
taking  the  next  individual. 

b.  Establish  the  record  of  this  individual  in  the 
LHT  with  the  relation  identifier  being  the  current  integer 
taken  from  the  global  count  (i.e.,  the  integer  taken  from  the 
global  count  in  step  2).  Return  this  individual  as  a  result. 

3.  Repeat  step  2  as  the  lem:(RS)  is  forced  and  as  long 
as  there  remain  individuals  in  the  unimg:R:x. 

4.  if  there  does  not  remain  any  individual  in  the 
unimg:R:x,  repeat  steps  1  and  2  as  the  lem:(RS)  is  forced,  by 

forcing  the  lem:S  and  obtaining  the  next  individual  from  the 
lem:S. 
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In  this  algorithm  we  used  the  LHT  as  the  SCBT  for 
remembering  the  individuals  produced  until  now  and  for  not 
producing  the  same  individuals  repeatedly,  because  the  sets 
produced  by  "Unit  image”  operations  may  not  always  be 
disjoint  or  they  may  be  the  same.  If  we  did  not  do  that  we 
might  produce  the  same  individuals  repeatedly  and  if  a 
complex  operation  were  being  performed  on  each  individual 
produced  we  might  execute  that  complex  operation  redundantly. 

Now  we  will  do  the  worst  case  asymptotical  time 
complexity  analysis  of  this  algorithm. 

If  we  execute  the  above  algorithm  until  no  more 
individuals  of  the  lem:(RS)  remain  to  be  produced,  we  produce 
the  whole  LEM  set  of  the  relation  S  and  for  each  individual 

cr* 

of  that  set  we  perform  the  "unit  image”  operation.  Assuming 
R  and  S  are  extensionally  represented  relations,  and  lem:S 
and  lem:R  have  the  same  cardinality  ”n”,  we  make  a  number  of 
memory  references  proportional  to  the  square  of  ”n”.  So  the 
worst  case  asymptotical  time  complexity  behaviour  of  this 
algorithm  is  0(n2).  Note  that  this  is  true  if  and  only  if 
the  relations  R  and  S  are  extensionally  represented 
relations.  In  the  case  they  are  composite  relations  the  time 
complexity  behaviour  of  the  algorithm  may  change  depending  on 
the  time  complexity  behaviour  of  the  operations,  ”unimg:”  and 
”lem:”  on  those  composite  relations. 
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The  algorithm  for  this  operation  is  the  same  as  the 
algorithm  for  the  operation  "lem:R",  but  in  step  2  of  this 
algorithm  we  have  to  follow  the  pointer  found  in  the  PFRM 
field  of  the  relation's  record  instead  of  the  PFLM  field. 
Hence  the  asymptotical  time  complexity  behaviour  of  this 
algorithm  is  the  same  as  the  asymptotical  time  complexity 
behaviour  of  the  algorithm  for  the  "lem:R". 
rim: (R-S) 

This  algorithm  does  the  reverse  operation  that  we  defined 
in  the  algorithm  for  lem:(R-S).  The  reader  should  make  the 
substitutions  below  in  the  algorithm  for  the  "lem:(R-S)M  in 
order  to  define  the  algorithm  for  the  "rim:  (R-S)": 

lemxR - >  rimiR 

unimg :R:y  - >  unimg'tRzy 

unimg'jSxs  — — >  unimgiSts 

So  the  asymptotical  time  complexity  behaviour  of  this 
algorithm  is  the  same  as  the  asymptotical  time  complexity 
behaviour  of  the  algorithm  for  "lem: (R-S)". 
rim: (R| (S) 

This  algorithm  is  similar  to  the  algorithm  of  the 
"lem:(R|  |S)".  We  have  to  make  the  substitutions  below  in  the 
steps  of  the  algorithm  defined  for  "lem: (R |  | S)"  in  order  to 
define  the  algorithm  for  "rim:(R|  |S)": 


lem:R 


>  rim:R 
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lea:S  — - >  rim:S 

So,  obviously,  the  asymptotical  time  complexity  behaviour  of 
this  algorithm  Is  the  same  as  the  asymptotical  time 
complexity  behaviour  of  the  algorithm  for  "lem:  (R|  |  S) ", 
because  the  algorithms  for  the  "lem:X"  and  "rlm:X"  have  the 
same  asymptotical  time  complexity  behaviour  when  "X”  is  an 
extensionally  represented  relation, 
rim: (RS) 

This  algorithm  is  similar  to  the  algorithm  defined  for 
"lem:(RS)”.  We  have  to  make  the  substitutions  below  in  the 
steps  of  the  algorithm  defined  for  the  ”iem:(RS)"  in  order  to 
define  the  algorithm  for  "rim:(RS)": 

lem:S - >  rim:R 

unimg:R:x — >  unimg':S:x 

So  the  asymptotical  time  complexity  behaviour  of  this 
algorithm  is  the  same  as  the  asymptotical  time  complexity 
behaviour  of  the  algorithm  defined  for  "lem:(RS)",  because 
the  algorithms  defined  for  the  unimg:W:x  and  unimg':W:x  have 
the  same  asymptotical  time  complexity  behaviour  if  the 
relation  W  is  an  extensionally  represented  relation;  in  the 
same  way,  the  algorithms  defined  for  the  lem:W  and  rim:W  have 
the  same  asymptotical  time  complexity  behaviour  when  the 
relation  ”W”  is  an  extensionally  represented  relation. 
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This  algorithm  is  similar  to  the  algorithm  defined  for 
the  (primitive)  "Image”  operation.  We  have  to  make  the 
substitution  below  in  the  algorithm  for  the  "Image"  operation 
in  order  to  define  the  algorithm  for  this  operation# 
unimg - >  unimg' 

So  the  asymptotical  time  complexity  behaviour  of  this 
algorithm  is  the  same  as  the  asymptotical  time  complexity 
behaviour  of  the  algorithm  defined  for  the  (primitive) 
"Image”  operation. 

unimg' tR>x  (Where  R  is  an  Bxtensionally  Represented  Relation) 
This  operation#  given  an  individual  in  the  codomain  of  a 
relation#  finds  the  set  of  individuals  that  are  in  relation 
with  this  individual  in  the  domain  of  the  relation  in 
question.  Of  course  in  our  case  the  individuals  of  the 
resulting  set  will  be  returned  to  the  caller  (function 
"Force”  itself)  one  at  a  time.  We  did  not  define  the 
algorithm  for  this  operation  in  the  extensional 
representations  analysis#  so  we  will  define  this  algorithm 
here.  The  algorithm  is  as  follows: 

1.  Find  the  relation's  record  in  the  relation  table# 
follow  the  pointer  found  in  the  PFRM  field  of  this  record# 
and  find  the  record  of  the  first  right  member  of  this 
relation  in  the  RHT. 


2.  Bash  into  the  LBT  with  the  argument  individual  under 
the  relation  identifier;  find  the  record  of  this  individual. 

3.  Reference  the  incidence  vector  of  the  relation  with 
the  indices  of  the  records  found;  if  a  1  is  found  in  the 
corresponding  location  do  step  3-a  else  to  step  3-b. 

a.  Take  the  next  integer  from  the  global  count#  set 
a  pointer  to  the  next  record  in  the  RIM  set  of  the  relation# 
hash  into  the  MHASH  table  with  this  integer  as  the  identifier 
and  save  this  pointer  in  the  MHASH  table  by  establishing  a 
record. 

b.  Proceed  in  the  RIM  set  by  following  the  pointer 
found  in  the  TASE  link  field  of  the  current  RHT  record  and 
repeat  steps  2  and  3  for  the  next  individual  in  the  RIM  set 
of  the  relation  R. 

4.  As  the  unimg':R:x  is  forced#  get  the  same  integer 
obtained  in  step  2#  by  going  through  the  same  counting 
mechanism;  hash  into  the  MHASH  table  and  find  the  pointer 
save;  follow  this  pointer  and  find  the  individuals  record  in 
the  RHT#  then  repeat  steps  2  and  3. 

Now  we  will  do  the  worst  case  asymptotical  time 
complexity  analysis  of  this  algorithm. 

If  we  force  the  above  algorithm  repeatedly  until  no  more 
individuals  remain  to  be  returned#  we  trace  through  the 
entire  RIM  set  of  the  relation  R#  so  we  make  a  number  of 
memory  references  proportional  to  ”n"#  where  "n*  is  the 
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cardinality  of  the  RIM  sat  of  the  relation  in  question.  So 
the  algorithm  has  the  worst  case  asymptotical  time  complexity 
behaviour  of  0{n)., 

(RfS)cttC 

We  know  that  the  converse  of  relation  (R#S)  is  not 
necessarily  an  injective  function,  even  though  the  relation 
(R#S)  is  by  definition  an  injective  function.  So  the 
algorithm  is  more  costly  then  the  algorithm  defined  for 
(R#S) ! sC.  The  algorithm  is  as  follows: 

1.  Force  the  argument  set  C,  repeatedly. 

2.  For  each  individual  (which  is  necessarily  a  pair) 
obtained  in  this  manner,  take  the  left  individual  of  this 
pair,  and  call  it  ax".  Force  unimg':R:x  repeatedly, 
establishing  the  record  of  each  individual  returned,  in  the 
LHT  with  the  integer  taken  from  the  global  count  as  the 
identifier.  In  the  same  manner  take  the  right  individual  of 
the  pair  and  call  it  "y*.  Force  unimg':S;y  repeatedly, 
establishing  the  records  of  the  individuals  returned  in  the 
LHT  with  the  next  integer  taken  from  the  global  count  being 
the  relation  identifier.  In  the  same  manner  take  the  right 
individual  of  the  pair  and  call  it  ”y”.  Force  unimg':S:y 
repeatedly,  establishing  the  records  of  the  individuals 
returned  in  the  LHT  with  the  next  integer  taken  from  the 
global  count  being  the  relation  identifier. 


3.  After  tae  set  C  is  exhausted,  take  the  intersection  of 
the  two  sets  created  in  the  LBT  by  using  the  algorithm 
defined  for  the  "Set  intersection"  in  the  extensional 
representations  analysis,  and  establish  the  resulting  set  in 
the  LBT  with  the  next  integer  taken  from  the  global  count  as 
the  identifier.  Return  the  first  individual  of  this  set, 
take  the  next  integer  from  the  global  count  and  save  the 
pointer  to  the  next  individual's  record  in  the  MBASB  table 
with  this  integer  as  the  identifier. 

4.  In  the  subsequent  force  operations  do  not  perform 
steps  1  through  3,  but  perform  the  counting  operation  done  in 
each  step;  i.e.,  take  the  integers  from  the  global  count  and 
hash  with  the  last  integer  found  in  step  3  into  the  MBASB 
table.  If  a  pointer  is  found  to  be  in  the  MBASB  table  with 
this  integer  as  the  identifier,  return  the  individual  whose 
record  is  pointed  by  this  pointer,  advance  the  pointer  to  the 
next  record  of  the  set  obtained  in  step  3,  and  save  the  new 
pointer  in  the  MBASB  table  with  the  same  integer  being  the 
identifier . 

So  if  we  force  the  above  algorithm  for  the  first  time,  it 
constructs  the  set  of  all  individuals  to  be  returned  in  the 
subsequent  force  operations  also,  and  returns  the  first 
individual.  In  the  subsequent  force  operations,  steps  1 
through  3  are  not  executed  except  in  order  to  find  the  same 
integer  found  in  step  3.  We  have  to  go  through  the  same 


counting  process  on  the  global  count  each  time  the  algorithm 
is  forced. 

Mow  we  will  do  the  worst  case  asymptotically  time 
complexity  analysis  of  this  algorithm. 

For  each  individual  of  the  set  C  we  force  the  unimg':R 
and  unimg':S.  Assuming  R,  S  and  C  are  extensionally 
represented,  we  make  the  number  of  memory  references 
proportional  to  the  "n*  for  each  individual  of  the  set  C.  If 
we  assume  that  the  sets  C»  rim:R  and  rim:S  all  have  the  same 
cardinality  V,  we  make  a  number  of  memory  references 
proportional  to  the  square  of  ”n*.  Taking  the  intersection 
of  two  sets  (in  step  3)  coses  us  a  number  of  memory 
references  proportional  to  the  "n"  (in  the  worst  cast  when 
the  cardinality  of  those  sets  are  exactly  equal  to  "n"). 
Because  this  factor  is  added  to  the  previous  term  of  the 
complexity  function,  it  does  not  affect  the  asymptotical  time 
complexity  behaviour  of  the  algorithm.  So  we  conclude  that 
the  algorithm  has  the  worst  case  asymptotical  time  complexity 
behaviour  of  0(n2). 
unimg:(R#S)c  ■  unimg':(R#S) 

This  algorithm  does  the  same  things  as  the  algorithm  for 
(R#S)  l  :C  does  except  in  this  case  the  argument  is  not  a  set 
(C);  instead  it  is  an  individual.  So  if  the  algorithm  for 
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(R#S)  1  sC  executes  on  a  singleton  set,  it  does  the  job  of  this 
algorithm,  so  there  is  no  need  to  rewrite  this  algorithm 
here. 

Now  we  will  do  the  worst  case  asymptotical  time 
complexity  analysis  of  this  algorithm. 

Because  the  algorithm  is  the  special  case  of  the 
algorithm  for  (R#S)1:C,  it  makes  the  number  of  memory 
references  proportional  to  the  ”n"  for  the  given  argument 
individual,  and  it  performs  the  set  intersection  operation  in 
0(n)  time.  So  with  the  same  assumptions  we  had  in  the  time 
complexity  analysis  of  the  algorithm  for  (RfS)l:C,  we 
conclude  that  the  worst  case  asymptotical  time  complexity 
behaviour  of  this  algorithm  is  0{n). 

(R |  |  S)  c  1  tC 

This  algorithm  is  similar  to  the  algorithm  for  (R#S)cl:C 
except  the  third  step  is  more  costly  than  the  third  step  of 
the  algorithm  of  the  (R#S)c!:C.  The  algorithm  is  as  follows: 

1.  Do  steps  1  and  2  of  the  algorithm  defined  for  the 
(R#S)cI :C. 

2.  Start  from  the  beginning  of  the  set  obtained  from 
repeated  application  of  unimg':R  operation.  For  each 
individual  found  by  proceeding  in  that  set,  start  from  the 
beginning  of  the  set  obtained  from  the  repeated  application 
of  unimg’sS  operation  and  proceed  in  that  set  record  by 


record#  by  looking  up  the  individual  being  represented  by 
each  record  encountered. 

3.  For  each  individual  pair  obtained  in  step  2# 
construct  a  pair  relation  (as  it  was  done  in  the  algorithm 
for  "Parallel  application"  in  the  extensional  representations 
analysis).  Take  the  next  integer  from  the  global  count# 
establish  the  record  of  the  individual  (pair)  in  the  LHT  with 
the  relation  identifier  being  this  integer#  and  link  the 
records  of  that  kind  by  their  TASS  links  as  they  are  created. 

4.  Repeat  step  4  of  the  algorithm  defined  for  the 
(R*S) I :C. 

Now  we  will  do  the  worst  case  asymptotical  time 
complexity  analysis  of  this  algorithm. 

As  we  can  see  the  algorithm  differs  from  the  algorithm 
for  (R#S)!:C  in  the  third  step  only.  So  instead  of 
intersecting  the  sets#  we  construct  a  set  which  has  pairs 
resulted  from  pairing  up  the  individuals  of  those  sets.  In 
constructing  pairs  out  of  the  individuals  of  the  sets 
obtained  in  step  1  of  this  algorithm#  we  make  a  number  of 
memory  references  proportional  to  the  square  of  ”n"  where  "n" 
is  assumed  to  be  the  common  cardinality  of  those  sets  and  the 
cardinality  of  the  rim:R  and  rim:S.  Because  this  term  will 
replace  the  linear  term  corresponding  to  the  set  intersection 
in  the  time  complexity  function  of  the  algorithm  of  the 
(R#S)cl:C#  and  because  we  already  have  an  order  two  term  in 


162 


tr* 


that  complexity  function/  this  additional  second  degree  term 
will  not  change  the  asymptotical  time  complexity  behaviour  of 
the  previous  complexity  function.  So  we  conclude  that  the 
algorithm  has  the  worst  case  asymptotical  time  complexity 
behaviour  of  0(n2).  In  practical  sense  it  is  more  expensive 
than  the  algorithm  for  the  (R#S)cl:C. 
unimg: (R| | S)c:x  ■  unimg ' : ( R ] | S) : x 

This  algorithm  is  similar  to  the  algorithm  defined  for 
the  (R||S)cl:C  except  the  argument  is  an  individual  rather 
than  a  set.  Hence/  if  we  force  the  (R|!S)ci:C  only  once, 
this  would  be  equivalent  to  forcing  the  unimg':  (R|  |  S)c:x, 
where  "x"  is  the  first  individual  of  the  set  "C".  Because  of 
this,  we  do  not  need  to  rewrite  this  algorithm  again. 

Now  we  will  do  the  worst  case  asymptotical  time 
complexity  analysis  of  this  algorithm. 

Even  though  we  can  find  the  intermediate  sets  in  step  1 
of  the  algorithm  referenced  above  by  making  a  number  of 
memory  references  proportional  to  "n"  (where  "n"  is  the  same 
as  the  "n"  defined  in  the  time  complexity  analysis  of  the 
algorithm  for  the  {R||S)i:C)f  we  have  to  pair  up  the 
individuals  of  those  sets/  which  requires  a  number  of  memory 
references  proportional  to  the  square  of  "n".  So  the  worst 
case  asymptotical  time  complexity  behaviour  of  this  algorithm 
becomes  0(n2). 
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(non:R)l:C,  unimg:  (non:R)  :x  (Where  R  is  Represented  Exten¬ 
sionally) 

When  the  relation  R  is  represented  extensionally  we  can 
do  these  operations  (production  of  individuals)  in  a  less 
complex  manner  than  the  algorithms  that  we  will  define  for 
general  case. 

The  algorithms  for  these  operations  are  exactly  the  same 
as  the  algorithms  defined  for  R1:C  and  unimg:R:x  that  work  on 
the  extensionally  represented  R,  except  we  will  design  these 
algorithms  so  that  it  will  accept  every  0  found  in  the  hash 
incidence  vector  as  1  and  every  1  as  0,  by  complementing 
every  entry  of  the  incidence  vector  tested  without  changing 
the  original  entry  during  the  operations.  By  doing  that  we 
do  not  have  to  complement  all  the  entries  of  the  incidence 
vector  by  executing  the  "Complement*  algorithm  defined  in  the 
extensional  representations  analysis;  instead,  our  algorithms 
that  work  on  the  complement  of  the  relation  in  question, 
assume  every  0  as  1  and  every  1  as  0.  Because  we  are  using 
essentially  the  same  algorithms  defined  for  R!:C  and 
unimg:R:x,  it  is  obvious  that  these  algorithms  will  also  have 
the  same  asymptotical  time  complexity  behaviours. 

(non:R)l:C  (General  Case) 

This  composite  set  presents  some  difficulties  in  defining 
the  algorithm  for  it  because  the  resulting  set  strictly 
depends  on  the  tuples  of  the  composite  relation  R  which  are 
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not  being  represented  explicitly.  The  algorithm  is  as 
follows : 

1.  Porce  the  lem:R  repeatedly;  take  the  next  integer 
from  the  global  count;  establish  the  record  of  each 
individual  obtained  in  the  LHT  with  this  integer  as  the 
identifier. 

2.  Porce  the  set  C  repeatedly;  for  each  individual  *x" 
obtained  in  this  manner  do  step  3. 

3.  Force  unimg:R:x  repeatedly,  if  it  returns  at  least 
one  individual  increment,  the  count  called  "CARO";  for  each 
individual  obtained  by  repeatedly  forcing  the  unimg:R:x,  hash 
into  the  LHT  with  the  integer  taken  from  the  global  count  (in 

■  step  1)  as  the  identifier.  Pind  the  record  of  this 

individual  and  increment  the  integer  in  the  index  field  of 
this  record  (which  is  not  being  used  in  this  case). 

4.  After  no  more  individuals  of  C  remain  to  be  produced, 
begin  from  the  beginning  of  the  set  constructed  in  the  LHT 
and  proceed  in  that  set  record  by  record.  For  each  record 
found  look  up  the  index  field;  if  the  integer  in  the  index 
field  is  equal  to  the  last  integer  saved  in  the  counter  CARO, 
delete  *.his  record  of  the  individual  from  the  set. 

3.  Start  from  the  beginning  of  the  set  resulting  from 
the  execution  of  step  4,  and  return  the  first  individual  to 
the  caller. 
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6.  If  the  algorithm  is  forced  subsequently,  skip  steps  1 
through  4,  but  allocate  the  integer  taken  from  the  global 
count  in  step  1,  and  return  the  next  individual  that  remains 
to  be  returned  in  the  set  resulting  from  the  execution  of 
step  4  during  the  first  forcing  operation. 

Now  we  will  do  the  worst  case  asymptotical  time 
complexity  analysis  of  this  algorithm. 

Suppose  the  argument  C  is  an  extensionally  represented 
set,  and  the  unimg  operation  on  the  composite  relation  R  has 
an  O(n)  time  algorithm,  where  "n*  is  the  cardinality  of  LEM 
set  of  the  composite  relation  R.  Because  we  are  forcing  the 
unimg :R  operation  repeatedly  with  the  arguments  being  each 
_ individual  of  the  set  C,  assuming  the  set  C  has  the 

V* 

cardinality  ”n",  we  make  the  number  of  memory  references 
proportional  to  the  square  of  "n".  So  under  these 
assumptions,  the  algorithm  has  the  asymptotical  time 
complexity  behaviour  of  0(m2).  in  finding  the  asymptotical 
time  complexity  behaviour  of  this  algorithm,  we  did  not  take 
into  account  the  establishment  of  the  lem:R  in  the  LHT, 
because  the  term  corresponding  to  this  operation  is  added  to 
the  term  which  we  found  above,  and  in  most  of  the  cases  this 
term  is  a  linear  term,  so  it  does  not  affect  the  asymptotical 
time  complexity  behaviour  of  the  algorithm.  Note  that  this 
cost  should  be  attributed  to  the  first  forcing  operation  done 
on  this  composite  set;  the  subsequent  force  operations  cost 
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us  constant  time#  because  we  established  the  set  of 
individuals  to  be  returned  when  this  composite  set  was  forced 
the  first  time.  That  means  the  subsequent  force  operations 
effectively  force  an  extensionally  represented  set  and  each 
force  operation  becomes  a  constant  time  operation. 
fan:Rl :C 

Producing  the  individuals  of  this  set  is  a  costly 
operation  which  requires  repeated  execution  of  the  expensive 
"Image"  operation.  The  algorithm  is  as  follows: 

1.  Take  the  first  integer  from  the  global  count;  force 
the  lem:R  repeatedly  and  increment  COUNT  1  for  each 
individual  produced;  establish  each  individual  produced  in 

^ ^  the  LHT,  with  this  integer  as  the  identifier.  In  the  same 

manner,  force  the  rim:R  repeatedly#  hash  into  the  LHT  with 
each  individual  under  the  above  relation  identifier.  If  this 
individual  does  not  have  any  record  in  this  set#  increment 
COUNT  2#  else  do  nothing. 

2.  Take  the  next  integer  from  the  global  count;  force 
the  composite  set  R!:C  repeatedly;  for  each  individual  of  the 
set  C  obtained  during  that  operation#  hash  into  the  LHT  with 
the  next  integer  taken  from  the  global  count  as  the 
identifier  and  establish  the  record  of  this  individual.  Call 
the  set  resulting  from  this  operation  C'.  In  the  same  manner 
take  the  next  integer  from  the  global  count  and  establish  the 
record  of  each  individual  of  the  set  resulting  from 
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repeatedly  forcing  the  R1:C  with  this  Integer  as  the  relation 
identifier . 

3.  Force  the  Image  operation  with  the  set  (other  than 
the  C')  resulting  from  the  execution  of  step  2,  being  the 
argument;  establish  the  records  of  the  resulting  set's 
individuals  in  the  LHT  under  the  same  relation  identifier 
obtained  in  step  2,  and  in  the  same  manner  as  explained  in 
step  2. 

4.  Proceed  in  the  same  manner  by  every  time  taking  the 
set  resulting  from  the  previous  step  and  forcing  the  "Image" 
operation  repeatedly  on  this  set  in  order  to  obtain  the  next 
set  and  each  time  increment  a  counter,  namely  "M".  Increment 
the  counter  COUNT  3  for  each  record  of  the  resulting  set 
created. 

5.  Do  step  4  until  M  -  COUNT  1  +  COUNT  2  or  COUNT  3  - 
COUNT  1. 

6.  Start  from  the  beginning  of  the  set  C',  for  each 
individual  found  by  proceeding  in  this  record  by  record,  hash 
into  the  LHT  with  the  resulting  set's  identifier,  (i.e., 
integer  obtained  in  step  2)  and  if  this  individual  does  not 
have  a  record  in  this  set,  establish  the  record  of  this 
individual  in  that  set. 

7.  Return  the  set  C'  to  the  storage  pool. 

8.  Return  the  set  constructed  in  step  1  to  the  storage 
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9.  Return  the  first  individual  of  the  resulting  set. 
Delete  the  record  of  this  individual  from  that  set. 

10.  if  the  algorithm  is  forced  subsequently,  skip  step  1 
through  8  and  execute  step  9,  but  under  any  condition  take 
the  integers  from  the  global  count  which  are  obtained  in  step 
2. 

Now  we  will  do  the  worst  case  asymptotical  time 
complexity  analysis  of  this  algorithm. 

We  defined  the  above  algorithm  for  general  case,  in  which 
the  relation  R  is  a  composite  relation.  But  because  it  is 
not  possible  to  list  down  an  infinite  number  of  composite 
relations,  we  will  analyze  the  case  in  which  the  relation  R 
is  an  extensionally  represented  relation. 

In  the  worst  case  each  of  the  intermediate  sets  obtained 
in  step  4  has  the  cardinality  "n-1",  where  "n"  is  the 
cardinality  oi  the  LEM  set  of  the  relation  R.  It  is  not 
equal  to  *n"  because  this  is  the  stopping  condition  of  the 
algorithm.  This  does  not  make  any  difference  in  the 
asymptotical  time  complexity  behaviour  of  the  "Image” 
algorithm,  so  we  make  a  number  of  memory  references 
proportional  to  the  square  of  "n”  for  each  repetition  of  step 
4.  we  know  that  in  the  worst  case  step  4  executes  M-2  times 
where  N  is  the  cardinality  of  the  MEM  set  of  the  relation. 
Let's  assume  the  LEM  and  the  RIM  set  of  the  relation  are 
disjoint  and  M»2*n,  where  "n"  is  again  assumed  to  be  the 
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common  cardinality  of  the  LEM  set  and  the  RIM  set  of  the 
relation.  So  it  is  clear  that  we  are  making  a  number  of 
memory  references  proportional  to  the  square  of  ”n”,  2*n 
times.  As  a  result  we  can  write  the  leading  term  of  the 
complexity  function  as  follows: 

K*(n3) 

where  K  >•  2 

So  we  conclude  that  under  these  conditions  the  algorithm  has 
the  worst  case  asymptotical  time  complexity  behaviour  of 
0(n3).  Note  that  we  did  not  take  into  account  the  cost  of 
execution  of  steps  1#  6,  7  and  8  because  the  terms  of  the 
complexity  function  associated  with  those  steps  would  be 
linear  terms  and  would  not  affect  the  asymptotical  time 
complexity  behaviour  of  the  algorithm. 

(san:R) ! :C 

This  algorithm  is  similar  to  the  algorithm  defined  for 
the  (fan:R)l:C*  except  we  do  not  create  the  set  C* ,  and  we 
have  to  omit  steps  6,  7  and  8.  So  the  asymptotical  time 
complexity  behaviour  of  this  algorithm  is  the  same  as  the 
asymptotical  time  complexity  behaviour  of  the  algorithm 
defined  for  (fan:R)l:Cr  because  the  asymptotical  time 
complexity  behaviour  of  steps  6,  7  and  8  did  not  have  any 
affect  on  the  asymptotical  time  complexity  behaviour  of  the 
algorithm  defined  for  the  (fan:R)!:C. 
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As  we  mentioned  earlier,  we  will  generalize  the  set 
operations:  Intersection,  Onion,  and  Difference  in  the 
context  of  producing  the  individuals  of  the  composite  sets. 
That  means  these  operations  will  have  special  meaning  in  our 
function  "Force”.  Each  of  those  set  operations,  being  binary 
operators,  may  take  on  any  kind  of  composite  sets  as 
operands,  and  produce  the  individuals  of  the  resulting 
composite  set.  The  algorithms  that  we  will  define  will 
establish  temporary  sets  in  the  LHT  in  order  to  produce  the 
individuals  of  the  resulting  sets  efficiently,  like  was  done 
in  some  of  the  algorithms  above.  There  exists  another 
technique  which  prevents  us  from  constructing  these  temporary 
sets,  and  requires  doing  membership  tests  for  each  individual 
to  be  produced.  We  will  give  an  example  of  this  technique  in 
defining  the  algorithm  for  the  "Intersection"  operation,  and 
explain  the  reasons  why  this  kind  of  algorithms  is  costly. 
On  the  other  hand,  our  technique  uses  more  storage,  but  we 
reuse  that  storage  many  times  by  maintaining  a  storage  pool 
as  explained  before.  Our  technique  is  aimed  at  splitting  the 
terms  of  the  time  complexity  function  rather  than  nesting  the 
terms  in  each  other  by  increasing  the  exponent  of  the  terms. 
This  can  be  done  simply  by  producing  and  saving  some  sets, 
temporarily  in  advance,  then  testing  the  individuals  to  be 
produced  against  those  sets,  rather  than  every  time  producing 
the  sets  which  the  individual  being  produced  is  to  be  tested 
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against.  Because  those  temporary  sets  will  be  established  in 
the  LHT,  the  membership  test  operations  will  be  constant  time 
operations. 

Set  Intersection  in  Producing  the  Composite  Set  Individuals 
(and) : 

In  this  algorithm  we  will  first  construct  one  of  the 
composite  operand  sets  explicitly  in  the  LHT#  then  we  will 
force  the  other  operand  set  as  the  composite  set  constructed 
with  "and"  operation#  is  forced.  Each  time  this  set  is 
forced  we  test  the  individual  to  be  produced  against  the  set 
constructed  in  the  LHT;  if  the  same  individual  also  exists  in 
that  set#  the  individual  in  question  is  produced#  otherwise 
we  continue  to  force  the  other  operand  composite  set  until  we 
find  an  eligible  individual  to  produce.  The  algorithm  is  as 
follows : 

1.  Take  the  next  integer  from  the  global  count  and  force 
the  right  operand  set  repeatedly.  For  each  individual  found 
in  this  manner#  hash  into  the  LHT  with  the  integer  taken  from 
the  global  count  above  as  the  relation  identifier.  Establish 
the  record  of  this  individual  in  the  LHT  if  it  does  not  have 
any  record  under  this  identifier  already. 

2.  Force  the  left  operand  set#  get  the  individual 
returned#  and  hash  with  that  individual  into  the  LHT  with  the 
integer  obtained  in  step  1  as  the  relation  identifier.  If 
there  exists  a  record  for  this  individual  in  that  set#  return 


this  individual  as  tha  result,  otherwise  force  the  left 
operand  set  again  in  order  to  continue  with  the  next 
individual  of  that  set. 

3.  If  none  of  the  individuals  of  the  left  operand  set  is 
found  to  be  in  the  set  (which  is  established  in  the  I*HT) 
return  "nil". 

4.  If  the  main  composite  set  is  forced  subsequently  skip 
step  1#  but  under  any  condition  take  the  integer  obtained  in 
step  1  from  the  global  count. 

As  can  be  seen  the  above  algorithm  does  not  care  about 
the  kind  of  composite  operand  sets  because  the  "Force" 
operation  being  performed  on  the  composite  operand  sets  is 
defined  for  all  kinds  of  composite  sets  and  that  is  what  we 
mean  by  general! stion. 

Now  we  will  do  the  worst  case  asymptotical  time 
complexity  analysis  of  this  algorithm. 

As  we  have  seen  in  the  previous  algorithms,  most  of  the 
operations  eventually  result  in  the  execution  of  the  "Unit 
image"  operation  on  the  extensionally  represented  relations, 
and  we  know  that  the  "Unit  image"  (algorithm,  when  it  works 
on  the  extensional  relations,  has  the  worst  case  asymptotical 
time  complexity  behaviour  of  0(n).  Except  some  special  cases 
for  which  we  defined  the  algorithms  in  this  section,  we 
expect  most  of  the  exhaustive  production  operations  to  be 
linear  time  operations,  because  step  1  and  step  2  of  this 


algorithm  are  independent  steps.  If  the  execution  of  step  1 
and  the  execution  of  step  2  have  linear  time  complexity 
behaviour,  clearly  the  algorithm  will  have  the  linear  time 
complexity  behaviour.  On  the  other  hand,  if  any  one  of  the 
steps  1  or  2  or  both  have  the  order  two  time  complexity 
behaviour#  clearly  the  algorithm  will  have  the  order  two  time 
complexity  behaviour#  and  so  on.  Hence  we  conclude  that  the 
worst  case  asymptotical  time  complexity  behaviour  of  the 
exhaustive  execution  of  the  "Force”  operation  on  the 
composite  operand  sets#  and  most  of  the  time  we  expect  the 
worst  case  asymptotical  time  complexity  behaviour  of  this 
algorithm  to  be  either  0(n)  or  0(n2). 

Now  suppose  we  defined  the  above  algorithm  by  using  the 

U? 

other  technique  we  mentioned  above#  in  which  case  we  would 
force  the  left  operand  set  and  obtain  the  individual#  then  we 
would  force  the  right  operand  set  repeatedly  and  compare  the 
individual  in  question  with  each  individual  produced.  By 
doing  that  we  would  recover  from  the  explicit  construction  of 
the  right  operand  set#  but  each  time  we  force  the  main 
composite  set  we  would  produce  all  the  individuals  of  the 
right  operand  set  which  is  very  inefficient  in  time.  So  our 
algorithm  saves  the  individuals  of  the  left  operand  set 
temporarily  and  subsequent  "Force"  operations  become 
effectively  the  same  as  forcing  the  right  operand  set. 


174 


Difference  Operation  (-) 


This  algorithm  is  almost  the  same  as  the  algorithm 
defined  for  the  intersection  operation.  In  order  to  make  the 
distinction  clear/  we  will  rewrite  step  2  of  the  algorithm 
defined  for  the  intersection  operation: 

2.  Porce  the  left  operand  set/  get  the  individual 
returned/  and  hash  with  this  individual  into  the  LHT  with  the 
integer  obtained  in  step  1  being  the  identifier.  If  there 
exists  a  record  of  this  individual  in  that  set/  force  the 
left  operand  set  again  in  order  to  continue  with  the  next 
individual  of  the  left  operand  set/  otherwise  return  this 
individual. 


Now  we  will  do  the  worst  case  asymptotical  time 
complexity  analysis  of  this  algorithm. 

We  can  use  the  same  argument  as  we  have  used  in  the  time 
complexity  analysis  of  the  algorithm  for  the  "Intersection" 
operation/  and  we  can  say  the  same  things  about  the  worst 
case  asymptotical  time  complexity  behaviour  of  this 
algorithm. 

Union  Operation  (or) 

We  could  define  this  algorithm  simply  as  follows: 

1.  Force  the  left  operand  setf  get  the  individual  and 
return  it/  if  there  remains  any  individual  to  be  returned  in 
the  left  operand  set. 

2.  If  no  individuals  of  the  left  operand  set  remain  to 
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be  returned,  force  the  right  operand  set,  get  the  individual 
and  return  it,  if  there  remains  any  individual  to  be  returned 
in  the  right  operand  set. 

3.  Do  the  above  steps  as  the  main  composite  set  is 
forced. 

The  above  algorithm  may  produce  the  same  individuals 
repeatedly  if  the  operand  sets  are  not  disjoint.  So  we  will 
again  save  the  individuals  of  the  left  operand  set  while  we 
are  producing  these  individuals  in  order  to  remember  which 
individuals  were  produced  before  and  not  produce  them  again. 
The  algorithm  is  as  follows: 

1.  Take  the  next  integer  from  the  global  count,  call 
this  integer  ”1".  Force  the  left  operand  set,  get  the 
individual  returned,  hash  into  the  LHT  with  this  individual 
and,  with  the  integer  ”1”  as  the  relation  identifier, 
establish  the  record  of  this  individual  in  the  LHT  under  the 
relation  identifier  "Z".  Return  the  individual  in  question. 

2.  Repeat  step  1  as  the  main  composite  set  is  forced. 
Construct  a  set  structure  in  the  LHT  out  of  the  records  of 
the  individuals  produced,  while  these  individuals  are  being 
produced. 

3.  Do  step  2  until  no  more  individuals  of  the  left 
operand  set  remain  to  be  produced  ,  as  the  main  composite  set 
is  forced. 
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4.  if  no  individual  of  the  left  operand  set  remains  to 
be  produced  and  if  the  main  composite  set  is  forced 
subsequently,  begin  producing  the  individuals  of  the  right 
operand  set  one  at  a  time  as  the  main  composite  set  is 
forced. 

5.  For  each  individual  produced  in  the  manner  explained 
in  step  4,  hash  into  the  LHT  with  this  individual  and  with 
the  integer  I  as  the  relation  identifier.  Check  if  this 
individual  has  a  record  in  that  set;  if  so  do  step  5-a  else 
do  step  5-b. 

a.  Force  the  right  operand  set  in  order  to  continue 
with  the  next  individual  of  the  right  operand  set,  and  go  to 
step  5. 

b.  Return  the  individual  in  question. 

Now  we  will  do  the  worst  case  asymptotical  time 
complexity  analysis  of  this  algorithm. 

As  can  be  seen  the  algorithm  has  essentially  the  same 
structure  as  the  algorithms  defined  for  the  "Intersection” 
and  "Difference"  operations  and  we  can  use  the  same  kind  of 
argument  in  this  case  also;  i.e.,  the  asymptotical  time 
complexity  behaviour  of  this  algorithm  depends  on  the 
asymptotical  time  complexity  behaviour  of  the  execution  of 
the  "Force"  operation  on  the  operand  (composite)  relations. 
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D.  MEMBERSHIP  TEST  ALGORITHMS 

As  we  mentioned  earlier,  in  some  operations  we  want  to 
test  the  membership  of  an  individual  in  a  given  set.  We  may 
do  this  in  two  ways: 

1.  We  produce  the  individuals  of  the  composite  set  one 
at  a  time  and  compare  each  individual  produced  with  the 
individual  in  question. 

2.  We  may  define  a  less  costly  algorithm  for  each  kind 
of  composite  set  which  focuses  on  the  individual  and  does 
less  work  in  testing  the  membership  of  this  individual  in  the 
given  composite  set. 

The  first  method  is  a  costly  sequential  method  since,  in 
some  cases,  we  may  produce  all  the  individuals  of  the 
composite  set.  The  second  method  defines  algorithms  for 
membership  tests  on  each  kind  of  composite  set  that  we  have 
given  the  algorithm  for  in  chapter  C,  whenever  the  cost  is 
less  than  the  cost  of  using  the  method  1.  On  the  other  hand 
Method  2  uses  up  some  memory  which  can  be  reused  by 
maintaining  a  storage  pool. 

In  defining  the  algorithms  for  the  membership  test  on  the 
different  kinds  of  composite  sets  we  will  use  a  new  notation 
which  short  cuts  a  lot  of  detailed  description  that  the 
reader  got  used  to  while  reading  the  previous  sections.  New 
notation: 
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Pr (composite  set)  *  Force  the  composite  set  repeatedly 


- — set - >  »  Create  a  set  out  of  the  individuals 

produced  by  the  operation  on  the  left 
of  the  arrow,  in  the  RHT  by 
attributing  a  unique  identifier  to 
this  set,  which  is  represented  as  a 
capital  letter  on  the  right  of  the 
arrow. 

— -tx— >  »  Transmit  the  resulting  set/individual 

to  the  next  operation  as  the  argument 
of  that  operation. 

—test-each-in — >  *  Do  the  membership  test  for  each 

individual  produced  by  forcing  the 
composite  set  shown  on  the  left  of 
the  arrow,  to  see  if  it  is  in  the  set 
which  is  shown  on  the  right  of  the 
arrow. 

- any — ->  »  If  any  individual  is  found  to  be  a 

member  of  the  set  indicated  on  the 
left  of  the  arrow,  output  "true". 

while  C2;  »  While  producing  the  individuals  of 

the  set  C2,  do  the  step  indicated  on 
the  right  of  the  ";"  also. 


4 


1 


tx - >varA 


is-in— > 


true : { statement} 


false : { statement} 


ise mpty.-C 


and 

or 

z 

X 

C 


Assign  the  logical  value  (true/false) 
indicated  on  the  left  of  the  arrow, 
to  the  boolean  variable  "A*. 

If  the  individual  indicated  on  the 
left  of  the  arrow  is  in  the  set 
indicated  on  the  right  of  the  arrow, 
output  "true". 

If  the  input  condition  was  "true" 
then  the  statement  indicated  in  the 
braces  is  true. 

If  the  input  condition  was  "false" 
then  the  statement  indicated  in  the 
braces  is  true. 

If  the  given  set  C  was  the  empty  set, 
output  "true"  otherwise  output 
•false". 

Set  difference  operation. 

Set  intersection  operation/logical 
"and"  operation. 

Set  union  operation/logical  "or" 
operation. 

The  individual  to  be  tested  for 
membership. 

Argument  individual. 

Argument  set. 


180 


left (x) 
right (x) 


Left  individual  of  the  pair  "x*. 
Right  individual  of  the  pair  *x" 


In  these  algorithms  extensionally  represented  temporary 
sets  will  be  given  unique  identifiers  from  a  global  count 
which  is  different  than  the  global  counts  associated  with  the 
high  level  composite  sets  in  the  function  "Porce"  case. 
These  temporary  sets  will  be  established  in  the  RHT  instead 
of  LHT  in  order  to  prevent  the  possible  collisions  that  may 
occur  because  of  the  sets  created  by  the  function  "Force" 
since  they  both  use  integers  as  identifiers. 

In  order  to  make  the  notation  clear,  we  will  define  two 
algorithms  in  a  way  as  we  have  done  before,  and  we  will 
explain  the  correspondence  between  the  notation  and  the  steps 
of  those  algorithms.  The  algorithms  referring  to  the 
remaining  operations  can  be  found  in  Appendix  0,  which  are 
expressed  by  using  the  notation  given  above. 

(R-S) I :C 

Given  an  individual  to  be  tested  for  membership  in  the 
composite  set  (R-S)l:C,  we  have  to  find  out  if  a  tuple  of  R 
exists  which  has  this  individual  as  the  left  individual.  The 
second  condition  is  the  right  individual  of  this  tuple  must 
be  in  the  argument  set  C  and  the  last  condition  is  this 
tuple  must  not  exist  in  the  relation  S.  The  algorithm  which 
checks  those  conditions  is  given  below: 


1.  Take  the  next  integer  from  the  global  count.  (Note 
that  this  global  count  is  not  the  same  as  the  global  count 
used  in  the  function  "Force").  Force  the  unimg':R:z 
repeatedly  and  for  each  individual  obtained  do  step  2.  (*  in 
our  notation  this  step  can  be  expressed  as:  Fr (unimg ' :R:z)  *) 

2.  Hash  to  the  RHT  with  the  individual  obtained  and  with 
the  integer  taken  from  the  global  count  as  the  identifier, 
establish  the  record  of  this  individual  in  the  RHT. 

3.  Link  the  records  of  the  individuals  to  each  other  by 
their  TASE  links  as  they  are  created.  (*  Assuming  the 
integer  identifying  the  set  produced  in  the  RHT  represented 
as  Cl,  in  our  notation,  all  of  the  above  steps  can  be 
expressed  as  follows: 

tUT*"  Fr  (unimg'  :R:z)  — — set— — >  Cl  *) 

4.  Force  repeatedly  unimg':S:Z;  for  each  individual 
obtained  in  this  manner,  hash  into  the  RHT  with  this 
individual,  with  the  integer  taken  from  the  global  count  in 
step  1  being  the  relation  identifier.  (In  this  case  the 
relation  identifier  is  used  to  identify  the  set  established 
in  the  RHT).  Check  if  this  individual  has  a  record  in  that 
set;  if  so  delete  this  record  else  do  nothing  and  continue 
with  the  next  individual  of  the  unimg':S:z.  (*  In  our 
notation  this  step  can  be  expressed  as  follows: 

Fr  (unimg' :S:z)  —  set— >  C2' 

while  Cl';  Cl  -  C2  - set-- — >  D ' 


182 


where  C2'  is  a  place  holder/  because  we  are  not  constructing 


that  set  explicitly.  The  second  statement  expresses  that/ 
while  producing  the  individuals  of  the  set  C2' ,  get  the  set 
difference  (C1-C2*)  also  and  call  the  resulting  set  D'.  Note 
that  the  identifier  of  the  resulting  set  (O')  is  actually  the 
integer  that  we  took  from  the  global  count  in  step  1  which  is 
represented  as  Cl/  but  we  used  D'  as  identifier  in  order  to 
emphasize  the  fact  that  the  set  Cl  may  change  after  that 
operation.  *) 

5.  Force  the  argument  set  C  repeatedly.  Take  each 
individual  returned  and  check  if  this  individual  is  in  the 
set  resulting  from  the  execution  of  step  3  by  hashing  into 
the  RHT  with  this  individual  under  the  relation  identifier 
(integer)  obtained  in  step  1.  If  so/  conclude  that  the 
individual  "z"  is  in  the  set  (R-S)l:C  and  quit  forcing  the 
argument  set  C.  Otherwise  continue  to  force  the  set  C  in 
order  to  test  the  remaining  individuals  of  the  set  C  in  the 
manner  explained  above.  (*  In  our  notation/  this  step  can  be 
expressed  as  : 

Fr  (C) — test-each-in — >  D' — any — >true{z  is  in  the  set} 
So/  in  our  notation/  the  complete  algorithm  can  be  written  as 
follows : 

Fr (unimg1 :R:z) - set >  Cl 

Fr  (unimg' :S:z) - set >  C2' 
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while  C2 Cl  -  C2'  - set - >  D' 

Pr  (C)  —  test-each- in— >  D*  — any— >true{  z  is  in  the  set}  *) 

Now  we  will  do  the  worst  case  asymptotical  time 
complexity  analysis  of  this  algorithm. 

We  can  write  the  time  complexity  function  of  this 
algorithm  as  follows: 

f  »  K  +  L  ♦  M 
where: 

K  corresponds  to  the  steps  1*  2  and  3. 

L  corresponds  to  step  4. 

M  corresponds  to  step  5  of  the  algorithm, 
and  K ,  L  and  M  are  not  constants  as  oppossed  to  our 
convention.  Instead  each  of  the  K,L  and  M  represents  a  term 
of  the  complexity  function.  The  worst  case  asymptotical  time 
complexity  behaviour  of  the  algorithm  is  exactly  the  same  as 
the  cost  of  the  one  of  the  terms  K,  L  or  M  indicated  above; 
i.e.f  if  the  most  of  exhaustively  forcing  the  set  Cr  (in  step 
5)  has  the  worst  case  asymptotical  time  complexity  behaviour 
of  0(n2)  and  the  other  terms  have  linear  behavior ,  the  worst 
case  asymptotical  time  complexity  behaviour  of  this  algorithm 
becomes  0(n 2),  etc. 

As  we  can  see/  the  cost  of  the  algorithm  strictly  depends 
on  the  type  of  composite  sets  we  are  forcing  in  the  various 
steps  of  the  algorithm.  In  general  we  can  say  that  the 
composite  sets  that  are  made  up  of  the  "unimg”  operation  have 
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linear  algorithms  in  the  production  of  the  individuals. 
Assuming  the  relations  R#  S  and  the  set  C  are  extensionally 
represented  we  conclude  that  the  worst  case  asymptotical  time 
complexity  of  this  algorithm  is  0(n)#  because  we  know  that 
the  algorithms  for  the  production  of  the  individuals  of  the 
composite  sets#  unimg’:R:z#  unimg':S:z  and  the  extensionally 
represented  set  C»  have  linear  behaviour  and  an  algorithm 
which  embeds  the  sequential  execution  of  those  algorithms 
will  also  have  linear  behaviour, 
unimg: (RS) :x  : 

The  algorithm  can  be  defined  as  follows: 

1.  Take  the  next  integer  from  the  global  count.  Porce 
the  composite  set#  unimg:S:x  repeatedly;  for  each  individual 

©?  obtained  in  this  manner  hash  into  the  RHT  with  this 

individual  and  with  the  integer  taken  from  the  global  count 
as  the  relation  identifier  (call  this  identifier  Cl). 
Establish  the  record  of  this  individual  in  the  RHT.  Link  the 
records  of  the  individuals  created  in  the  RHT  by  their  TASE 
links  as  they  are  created.  (*  In  our  notation#  this  step  can 
be  expressed  as  follows: 

Fr  (unimg :S:x) - set - >  Cl  *) 

2.  Porce  the  composite  set#  unimg':R:z  repeatedly;  for 
each  individual  obtained  in  this  manner  hash  into  the  RHT 
with  this  individual  and  with  the  relation  identifier  Cl.  If 
this  individual  has  a  record  in  the  set  constructed  in  step 
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1/  conclude  that  the  individual  being  tested  for  membership 
is  in  the  set,  otherwise  continue  with  the  next  individual  of 
composite  set  unimg':R:z  by  forcing  it  again.  (*  In  our 
notation ,  this  step  can  be  expressed  as  follows: 

Pr  (unimg '  :R:z) - set - >  C2 ' 

while  C2':  Cl  and  C2'  — --set-- — >  D' 

while  D'j  isempty(D')  - tx - >  false{z  is  in  the  set} 

where  C2*  is  again  a  place  holder  set  identifier ,  because  we 
are  not  constructing  the  extensional  representation  structure 
for  this  set;  instead  we  are  producing  its  individuals. 
second  statement  means:  "while  producing  the  individuals  of 
the  set  C2',  try  to  get  the  intersection  of  the  sets  Cl  and 
C2',  and  call  the  resulting  set  D'"#  which  we  will  not 
construct  the  extensional  representation  for.  The  last 
statement  means  as  soon  as  an  individual  of  the  non-existing 
set  O'  is  found,  conclude  that  the  individual  being  tested 
for  membership  is  in  the  set.  As  can  be  seen,  we  are  using 
the  identifiers  of  some  intermediate  sets  even  though  we  are 
not  representing  them  extensionally  in  order  to  make  the 
algorithms  as  understandable  as  possible.  *) 

Now  we  will  do  the  worst  case  asymptotical  time 
complexity  analysis  of  this  algorithm. 

We  can  repeat  the  same  argument  as  we  have  done  above,  as 
follows:  It  is  clear  that  the  algorithm  has  the  worst  case 
asymptotical  time  complexity  behaviour  of  0(n)  if  the 


algorithms  for  exhaustively  forcing  unimg:S:x  and  unimg':R:z 
have  linear  time  complexity  behaviour.  So  if  we  assume  that 
the  relation  R  and  relation  S  are  extensionally  represented 
relations  we  conclude  that  the  algorithm  has  the  worst  case 
asymptotical  time  complexity  behaviour  of  0(n),  because  we 
know  that  the  algorithms  for  forcing  the  unimg:S:  and  the 
unimg':R:  have  the  linear  time  complexity  behaviour  in  the 
case  the  R  and  S  are  extensionally  represented  relations. 

We  will  not  do  the  complexity  analysis  for  the  remaining 
algorithms,  because  they  are  all  similar  to  each  other  and 
present  the  same  time  complexity  characteristics  as  the  above 
examples  by  having  disjoint  steps  and  using  the  (unimg:) 
and/or  (unimg':)  operations. 
ff*  E.  FUNCTION  APPLICATION  ALGORITHMS 

As  we  have  done  in  the  other  operations  before,  we  will 
define  an  algorithm  for  each  kind  of  composite  relation  which 
can  be  applied  to  an  individual.  In  chapter  3  we  listed  down 
the  kinds  of  composite  relations  and  mentioned  that,  we  could 
define  an  arbitrary  number  of  different  composite  relations 
by  substituting  the  composite  relations  in  each  other  as  the 
operand  relations.  Because  the  function  application 
operation  will  be  defined  for  each  kind  of  composite 
relations  in  terms  of  the  unimg,  unimg*  and  function 
application  operations  on  the  operand  relations,  and  because 
the  unimg,  unimg',  and  function  application  operations  are 


187 


defined  for  each  kind  of  composite  relation,  no  confusion 
arises.  The  reader  should  think  of  the  operand  relations  R 
and/or  S  in  each  kind  of  composite  relation  as  another 
composite  relation  or  an  extensionally  represented  relation. 
In  the  time  complexity  analysis  we  will  assume  the  operand 
relation/relations,  R  and/or  S  as  extensionally  represented 
relation/relations,  because  we  are  unable  to  do  a  complexity 
analysis  on  the  infinite  number  of  composite  relations  that 
may  be  obtained  by  substituting  the  other  composite  relations 
in  the  operand  relations/relation.  In  these  algorithms  the 
temporary  sets  are  given  integer  identifiers  from  the  same 
global  count  we  used  in  the  membership  algorithms,  because 
both  kinds  of  algorithms  are  in  the  main  body  of  the 
interpreter  and  these  sets  are  established  again  in  the  RHT. 
The  algorithms  are  given  below: 

(R&S) :x 

The  algorithm  for  this  composite  relation  can  be  defined 
as  follows: 

1.  Take  the  next  integer  from  the  global  count  and  force 
the  composite  set  unimg:R:x;  take  each  individual  returned 
and  hash  into  the  RHT  ith  this  individual,  and  with  the 
relation  identifier  being  the  integer  taken  from  the  global 
count.  Establish  the  record  of  this  individual  in  the  RHT. 
Link  the  records  of  the  individuals  to  each  other  by  their 
TASE  links  as  they  are  created. 


2.  Force  the  set  unimg:S:x  repeatedly;  take  each 
individual  returned  and  hash  into  the  RHT  with  this 
individual  and  with  the  integer  obtained  in  step  1  as  the 
relation  identifier.  If  this  individual  has  a  record  in  that 
set*  return  this  individual,  quit  forcing  unimg:S:x  and 
return  the  set  constructed  in  the  RHT  to  the  storage  pool. 
Otherwise  continue  with  the  next  individual  of  the  unimg:S:x. 

Mow  we  will  do  the  worst  case  asymptotical  time 
complexity  analysis  of  this  algorithm. 

In  the  worst  case  the  sets  unimg:R:x  and  unimg:S:x  may  be 
disjoint.  In  that  case  we  force  the  composite  set  unimg:S:x 
exhaustively.  By  assuming  the  relations  R  and  S  are 
extensionally  represented  relations,  in  which  case  the 
exhaustive  force  operations  on  the  composite  sets  unimg:F:x 
and  unimg:S:x  have  linear  behaviour,  we  conclude  that  the 
worst  case  asymptotical  time  complexity  behaviour  of  the 
algorithm  is  0(n)  where  "n”  is  assumed  to  be  the  common 
cardinality  of  the  sets  lem:R  and  lem:S.  The  algorithm  has  a 
linear  asymptotical  time  complexity  behaviour  because  it 
incorporates  the  sequential  execution  of  two  linear 
algorithms.  Mote  that  we  have  another  linear  term  which 
corresponds  to  disconnecting  of  the  set  created  in  step  1 
from  the  RHT  and  returning  it  to  the  storage  pool.  This 
operation  is  explained  many  times  in  the  algorithms  for  the 
extensional  representation  techniques  and  is  shown  to  have 
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linear  behaviour/  so  adding  this  tern  to  the  conplexity 
function  of  the  above  algorithm  would  not  change  the 
asymptotical  time  conplexity  behaviour  of  the  algorithm. 

(R| S) :x 

Among  all  the  algorithms  that  we  will  define#  the 
simplest  one  is  the  algorithm  for  the  union  operation.  The 
algorithm  is  as  follows: 

1.  Apply  relation  "R"  to  the  argument  individual  "x";  if 
the  individual  is  found#  return  that  individual  otherwise  do 
step  2. 

2.  Apply  relation  "S"  to  the  argument  individual  "x";  if 
an  individual  is  returned#  return  this  individual  otherwise 
call  error  routine. 

Now  we  will  do  the  worst  case  asymptotical  time 
complexity  analysis  of  this  algorithm. 

We  know  that  the  algorithm  for  the  function  application 
operation  on  the  extensionally  represented  relations  has  a 
constant  time#  time  complexity  behaviour#  because  step  1  and 
step  2  are  disjoint  steps.  By  considering  the  worst  case#  in 
which  we  can  not  obtain  an  individual  by  executing  step  1#  we 
would  write  the  time  complexity  function  of  this  algorithm  as 
follows: 

f  -  K+K  -  2*K 

where  K  is  the  constant  number  of  memory  references  made  by 
the  function  application  algorithm  defined  in  the  extensional 


representation  analysis.  So  we  conclude  that  the  algorithm 
has  the  worst  case  asymptotical  time  complexity  behaviour  of 
0{c) . 

(R-S) :x 

This  algorithm  is  very  similar  to  the  algorithm  that  we 
defined  for  the  composite  relation  (R&S),  but  we  have  to 
modify  step  2  of  that  algorithm  slightly.  So  we  rewrite  seep 
2  as  below: 

2.  Force  the  set  unimg:S:x  repeatedly;  take  each 
individual  returned,  hash  into  the  RHT  with  this  individual 
and  with  the  integer  obtained  in  step  1  as  the  relation 
identifier.  If  this  individual  has  a  record  in  that  set, 
delete  this  record.  If  after  the  set  unirag :S:x  is  exhausted, 
the  set  in  the  RHT  still  has  the  records  of  some  individuals, 
return  the  individual  represented  by  the  first  record  of  that 
set.  Return  the  resulting  set  to  the  storage  pool. 

Now  we  will  do  the  worst  case  asymptotical  time 
complexity  analysis  of  this  algorithm. 

The  asymptotical  time  complexity  behaviour  of  the 
algorithm  is  essentially  the  same  as  the  asymptotical  time 
complexity  behaviour  of  the  algorithm  defined  foe 
composite  relation  (R&S),  but  the  difference  is:  m  step  . 
we  force  the  composite  set  unimg:s:x  exhaustively  jn.ie  > 
conditions.  So  the  average  case  time  complexity  i 
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of  those  algor  it  ha  a  differs  but  they  have  the  saae  worst  case 
asymptotical  tiae  coaplexity  behaviour. 

(non:R) :x 

We  can  define  this  algorithm  in  terms  of  the  previous 
algorithms  we  defined.  The  algorithm  is  as  follows: 

1.  Force  the  composite  set: 

(lem:R  -  (uniag:R:x)) 

once ,  take  the  resulting  individual  and  return  it. 

How  we  will  do  the  worst  case  asymptotical  tiae 
coaplexity  analysis  of  this  algorithm. 

We  intend  to  do  this  analysis  so  that  we  can  also  show 
the  reader  how  to  find  the  time  coaplexity  behaviour  of  a 
segment  of  a  relational  program. 

Assume  that  the  relation  R  is  represented  extensionally. 
According  to  the  definition  of  (difference)  operation  in 
the  function  "Force”,  we  force  the  composite  set  uniag:R:x 
repeatedly,  and  construct  a  set  in  the  LHT  out  of  the 
individuals  returned,  because  we  assumed  that  the  relation  R 
is  represented  extensionally.  In  order  to  construct  this 
set,  we  make  a  number  of  memory  references  proportional  to 
”n”,  where  "n"  is  the  cardinality  of  the  LBM  set  of  the 
relation.  Then  we  force  the  composite  set  lea-R,  and  we  test 
if  the  individual  returned  is  in  that  set:  if  so  we  return 
this  individual:  otherwise  we  continue  to  do  the  same  thing 


for  the  next  individual  of  the  len:R  by  forcing  the  lea:R 
again.  Because  in  the  worst  case: 
lea:R  *  uniag:R:x 

We  force  the  leatR  until  no  aore  individuals  reaain  to  be 
produced,  and  we  aake  a  nuaber  of  aeaory  references 
proportional  to  *n",  where  "n"  is  again  the  cardinality  of 
the  LEM  set  of  the  relation  R.  Because  those  two  exhaustive 
sequences  of  "Force"  operations  are  aade  one  after 
another  (i.e.,  the  steps  are  disjoint),  the  teras  of  the 
coaplexity  function  associated  with  those  steps  should  be 
added  rather  than  aultiplied,  so  the  resulting  coaplexity 
function  will  have  linear  behaviour.  Under  those 
considerations,  we  conclude  that  the  worst  case  asyaptotical 
tiae  coaplexity  behaviour  of  the  algor itha  is  0(n). 

Mote  that  we  do  not  need  this  coaplex  algor  itha  if  the 
relation  R  is  extensionally  represented  because,  like  we  have 
done  in  the  uniag  and  iaage  operations  on  the  coapleaent  of  a 
relation,  we  can  define  an  algoritha  which  assuaes  the  l's  of 
the  incidence  vector  as  0's  and  vice  versa.  We  will  define 
that  algoritha  next,  but  we  want  to  eaphasise  again  that  the 
above  algoritha  refers  to  the  general  case  where  Rcan  be  any 
coaposite  relation,  but  we  have  to  assuae  R  as  an 
extensionally  represented  relation  in  order  to  be  able  to  do 
the  tiae  coaplexity  analysis. 


(nonsR) :x  (where  R  is  Represented  Extensionally) 

This  algorithm  is  different  from  the  algorithm  defined 
for  R:x  in  the  extensional  representations  analysis.  It  can 
be  defined  as  follows: 

1.  Bash  to  the  RHT  with  the  argument  individual  under 
the  given  relation  identifier;  find  the  record  of  this 
individual. 

2.  Start  from  the  beginning  of  the  RIM  set  of  this 
relation  and  proceed  in  this  set  record  by  record  by 
following  the  TASK  links  between  the  records. 

3.  Por  each  pair  of  individuals  (tuple)  found  in  steps  1 
and  2,  reference  the  incidence  vector  and  return  the  left 
individual  of  the  first  tuple  for  which  a  0  is  found  in  the 
corresponding  incidence  vector  location. 

Mow  we  do  the  worst  case  asymptotical  time  complexity 
analysis  of  this  algorithm. 

In  the  worst  case  all  the  left  members  of  the  relation 
may  be  in  relation  with  the  argument  individual  which  is  in 
the  RIM  set  of  the  relation.  In  that  case  we  trace 
exhaustively  the  LSM  set  of  the  relation  and  make  a  number  of 
memory  references  proportional  to  "n”,  where  "n*  is  the 
cardinality  of  the  LBM  set  of  the  relation.  So  the  algorithm 
has  the  worst  case  asymptotical  time  complexity  behaviour  of 
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(R||S) :x 

The  algorithm  for  this  composite  relation  can  be  defined 
as  follows: 

1.  Extract  the  left  individual  of  the  argument 
individual  "x"  (which  is  necessarily  a  pair)#  apply  relation 
R  to  this  individual#  and  save  the  individual  returned. 

2.  Extract  the  right  individual  of  the  argument 
individual  "x"#  apply  relation  S  to  this  individual#  and  save 
the  individual  returned. 

3.  Construct  a  pair  relation  out  of  the  individuals 
saved  in  step  1  and  step  2#  and  return  the  pointer  to  the 
record  of  this  pair  (  which  is  established  in  the  relation 
table)  to  the  caller. 

Now  we  will  do  the  worst  case  asymptotical  time 
complexity  analysis  of  this  algorithm. 

We  know  that  the  algorithm  for  the  function  application 
on  an  extensionally  represented  relation  has  the  asymptotical 
time  complexity  behaviour  of  0(c)#  and  we  invoke  this 
algorithm  two  times  in  steps  1  and  2.  In  addition  we 
construct  the  pair  relation  out  of  the  individuals  resulting 
from  those  function  applications  in  constant  time.  Because 
the  steps  1#  2  and  3  are  disjoint  steps#  the  terms  of  the 
complexity  function  associated  with  those  steps  are  added# 
and  the  worst  case  asymptotical  time  complexity  behaviour  of 
this  algorithm  automatically  becomes  0(c). 
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(R#S) :x 


The  algorithm  for  this  composite  relation  is  similar  to 
the  algorithm  for  the  composite  relation  (R|  |  S) ,  except,  we 
apply  relations  R  and  S  directly  to  the  argument  individual 
■x"  in  steps  1  and  2;  obviously,  it  has  the  same  asymptotical 
time  complexity  behaviour  as  the  algorithm  for  comnposite 
relation  (R||S). 

(RS) sx 

The  algorithm  for  this  composite  relation  can  be  defined 
as  follows: 

1.  Apply  the  relation  S  to  the  argument  individual  "x"; 
take  the  individual  returned  and  call  it  ”y". 

2.  Apply  the  relation  R  to  the  individual  "y”?  take  the 
individual  returned  and  return  it. 

Now  we  will  do  the  worst  case  asymptotical  time 
complexity  analysis  of  this  algorithm. 

We  know  that  the  function  application  costs  us  constant 
time  if  the  relation  in  question  is  represented 
extensionally,  so  the  steps  1  and  2  of  the  above  algorithm 
cost  us  constant  time  each.  Because  the  steps  are  disjoint, 
we  conclude  that  the  worst  case  asymptotical  time  complexity 
behaviour  of  this  algorithm  is  also  constant  time. 

Meta  Application  ((R::S):x) 

We  did  not  define  an  algorithm  for  this  operation  in  the 
extensional  representations  analysis  because  it  was  hard  and 


infeasible  to  construct  tbs  •xtsnsional  reprsssntation 
structure  for  this  case.  Before  we  go  into  the  reasons  for 
doing  that,  we  will  summarize  what  the  operation  does. 

This  operation  applies  the  right  operand  relation  to  the 
argument  individual  and  records  the  individual  obtained,  then 
it  applies  the  left  operand  relation  to  the  argument 
individual.  Zf  the  individual  obtained  is  a  relation,  it 
applies  this  relation  to  the  individual  recorded  above  and 
returns  the  resulting  individual. 

Mow  suppose  we  tried  to  construct  the  extensional 
representation  structure  for  this  operation,  we  would  need 
to  apply  each  of  the  relations  known  by  the  system  so  far  to 
all  individuals  known  by  the  system,  and  we  would  construct 
the  RIM  set  of  this  relation  out  of  the  individuals  resulting 
from  those  application  operations.  Then  we  would  need  to 
apply  each  of  the  relations  known  by  the  system  to  all  of  the 
individuals  of  the  RZM  set  of  the  relation  and  construct  the 
UN  set  of  this  relation  out  of  the  individuals  resulting 
from  those  application  operations.  As  can  be  easily  seen, 
the  process  is  very  costly  and  the  resulting  relation  should 
be  updated  as  soon  as  a  new  relation  and/or  a  new  individual 
is  introduced  to  the  system.  On  the  other  hand,  if  we  do 
this  operation  intensionally,  no  problem  arises.  The 
algorithm  for  this  operation  is  given  below: 


1*  Apply  the  right  operand  relation  to  the  argument 
individual}  record  the  individual  returned. 

2.  Apply  the  left  operand  relation  to  the  argument 
individual  and  take  the  individual  returned,  hash  into  the 
relation  table  with  this  individual  and  check  if  it  is  a 
relation.  In  the  case  this  relation  is  not  in  the  relation 
table  but  represented  intensionally  (i.e.,  if  it  is  a 
composite  relation),  hash  into  the  relation  table  with  each 
relation  identifier  out  of  which  this  composite  relation  is 
constructed.  In  any  case,  if  this  relation  is  found  to  be 
applicable  to  the  argument  individual  in  question,  apply  this 
relation  to  the  individual  recorded  in  step  1  and  return  the 
resulting  individual.  Otherwise,  call  the  error  routine. 

Now  we  will  do  the  worst  case  asymptotical  time 
complexity  analysis  of  this  algorithm. 

Let  the  left  operand  relation  (R)  and  right  operand 
relation  (S)  be  extensionally  represented  relations.  We  know 
that  the  function  application  operation  on  extensionally 
represented  relations  has  the  constant  time  asymptotical 
time  complexity  behaviour.  As  a  result,  step  1  and  step  2 
make  a  constant  number  of  memory  references  so  we  conclude 
that  the  worst  case  asymptotical  time  complexity  behaviour  of 
this  algorithm  is  0(c)  in  the  case  the  operand  relations  are 
extensionally  represented  relations. 
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(fan:R)  :x 

This  operation  is  an  expensive  operation  relative  to  the 
other  function  application  operations  on  the  various 
coaposite  relations.  The  algorithm  is  as  follows: 

1.  Porce  the  composite  set: 

lem:R|ria:R 

repeatedly,  count  the  number  of  individuals  produced,  i.e., 
obtain  the  cardinality  of  the  MEM  set  of  the  relation,  and 
call  this  *M*. 

2.  Apply  relation  R  to  the  argument  individual  and  take 
the  individual  obtained,  apply  the  relation  R  to  this 
individual  again  and  take  the  individual  obtained.  Repeat 
the  application  operation  in  the  same  manner  by  each  time 
applying  the  relation  R  to  the  individual  obtained  from  the 
previous  application  operation  M  times  or  until  an 
application  operation  returns  "nil". 

3.  In  the  first  case  when  the  application  operation  is 
repeated  M  times,  return  the  last  individual  obtained.  In 
the  second  case  when  an  application  operation  returns  "nil", 
return  the  individual  obtained  from  the  previous  application 
operation  which  has  not  returned  "nil". 

Mow  we  will  do  the  worst  case  asymptotical  time 
complexity  analysis  of  this  algorithm. 

We  know  that  the  term  of  the  time  complexity  function 
corresponding  to  step  1  of  the  algorithm  has  the  linear  time 
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complexity  behaviour  because  in  that  step  we  make  a  number  of 
memory  references  proportional  to  M  (where  M  is  the 
cardinality  of  the  MBM  set  of  the  relation  in  question) ,  in 
the  case  the  relation  R  is  an  extensionally  represented 
relation.  The  term  of  the  complexity  function  corresponding 
to  step  2  also  has  linear  time  complexity  behaviour ,  because 
each  function  application  operation  makes  a  constant  number 
of  memory  references  and  we  repeat  the  function  application 
operation  M  times  in  the  worst  case.  So  in  step  2  we  make  a 
number  of  memory  references  proportional  ta  M,  where  M  is  the 
cardinality  of  the  MBM  set  of  the  relation  in  question. 
Because  steps  1  and  2  are  disjoint  steps  we  add  the  terms  of 
the  complexity  function  corresponding  to  step  1  and  step  2 
together  and  obtain  a  linear  time  complexity  function.  So  we 
conclude  that  the  worst  case  asymptotical  time  complexity 
behaviour  of  this  algorithm  is  0(n)  where  Nn"  is  the 
cardinality  of  the  MBM  set  of  the  relation  in  question. 
(san:R) :x 

This  operation  is  not  a  well  defined  operation  because, 
given  the  argument  individual,  we  can  check  if  it  is  a  member 
of  the  NBN  set  of  the  relation  in  question  and  if  it  is,  we 
return  the  argument  individual  itself  as  the  resulting 
individual.  This  property  of  the  operation  originates  from 
the  fact  that  the  second  ancestral  of  a  relation  is  the 
reflexive  transitive  closure  of  the  relation.  Because  the 


second  ancestral  of  a  relation  has  to  be  a  reflexive 
relation.  If  the  given  (argument)  individual  is  in  the  RIM 
set  of  the  relation/  it  has  to  be  in  the  LEM  set  also.  So 
after  doing  the  above  membership  test/  we  can  immediately 
return  the  argument  individual  itself  as  the  resulting 
individual.  Thus  this  operation  does  not  have  any  meaning 
from  the  user*s  point  of  view. 

Even  though  it  does  not  have  meaning/  we  will  use  this 
operation  as  a  part  of  another  operation/  which  is  equivalent 
to  "while"  loop  in  conventional  languages.  This  operation  is 
explained  below. 

C/ ( (sansR) tx) 

This  operation  can  be  viewed  as  a  while  loopr  in  which 
the  left  restriction  operation  imposes  the  condition  of  the 
loop  and  "san:"  operation  forces  the  loop  to  iterate.  Since 
we  did  not  define  the  "san:”  operation  we  will  accept  this 
operation  like  a  completely  new  operation  and  we  will  define 
an  algorithm  for  it.  The  algorithm  is  as  follows: 

1.  Test  if  the  argument  individual  x  is  in  the  MEM  set  of 
the  relation  R. 

2.  if  it  is  in  the  MEM  set  of  the  relation  Rr  test  if  it 
is  in  the  set  C;  return  this  individual  as  the  result; 
otherwise/  do  step  4. 

3.  If  it  is  not  in  the  MEM  set  of  the  relation  then  call 


the  error  routine 


4.  Apply  relation  R  to  the  individual  x;  take  the 
individual  returned  and  test  if  it  is  in  the  set  C.  If  so 
return  this  individual  else  do  step  5  for  this  individual. 

5.  Apply  relation  R  to  the  individual;  take  the 
individual  returned  and  test  if  it  is  in  the  set  C.  If  so 
return  this  individual  else  repeat  the  step  for  this 
individual. 

Now  we  will  do  the  worst  case  asymptotical  time 
complexity  analysis  of  this  algorithm. 

As  it  is  in  the  "while”  loop  of  conventional  languages, 
if  the  condition  is  never  met  the  algorithm  goes  into  an 
infinite  loop.  As  can  be  seen  we  perform  one  function 
application  operation  and  one  membership  test  operation  in 
each  iteration  of  the  loop.  So  the  worst  case  asymptotical 
time  complexity  behaviour  of  the  algorithm  is  the  same  as  the 
worst  case  asymptotical  time  complexity  behaviour  of  either 
the  membership  test  operation  or  the  function  application 
operation  (depending  on  which  operation  is  more  expensive), 
times  the  number  of  iterations. 

If  we  assume  that  the  set  C  and  the  argument  relation  R 
are  represented  extensionally,  it  is  obvious  that  each 
iteration  costs  us  constant  time  because  the  membership  test 
operation  is  a  constant  time  operation  when  the  set  in 
question  is  represented  extensionally.  In  the  same  way,  when 
the  relation  is  represented  extensionally,  the  function 


application  operation  becomes  a  constant  time  operation.  So 
the  only  variable  remaining  to  be  taken  into  account  is  the 
number  of  iterations.  We  conclude  that  the  algorithm  has  the 
worst  case  asymptotical  time  complexity  behaviour  (in  the 
case  of  the  set  C  and  the  relation  R  represented 
extensionally)  of  0(n),  where  n  is  the  number  of  iterations. 

In  the  same  manner  we  can  define  an  operation  which  is 
equivalent  to  the  "repeat"  loop  in  the  conventional 
languages.  This  operation  is  explained  below. 

C/((fan:R)  :x) 

Even  though  we  defined  an  algorithm  for  the  operation 
"(fan;R):x",  we  can  not  make  use  of  it  in  this  case.  We  have 
to  accept  this  operation  as  a  stand  alone  operation.  Because 
the  operation  "(fan:R):x"  finds  an  individual  to  be  returned 
as  a  result  and  quits  at  that  point;  on  the  other  hand  in 
this  operation  we  want  the  loop  to  continue  if  the  resulting 
individual  is  not  an  element  of  the  set  C.  The  algorithm  lor 
this  operation  is  the  same  as  the  algorithm  of  the  operation 
"C/( (fan:R) sx) "  except,  we  do  not  include  the  step  1,  2  and  3 
of  that  algorithm.  So  the  same  worst  case  asymptotical  time 
complexity  analysis  can  be  done  in  this  case  too. 

Reduction  Operation  (@s (R,f ) ) : (i,x) 

This  operation  is  aimed  at  reducing  sequences.  The 
operation  takes  a  sequence  R,  a  function  f  (which  takes  a 
pair  and  returns  an  individual),  an  initial  value  i,  and  the 


first  individual  of  the  sequence  x.  It  applies  the  function 
f  to  the  argument  pair,  takes  the  resulting  individual,  and 
constructs  a  new  pair  in  which  the  left  component  is  the 
individual  obtained  above  and  the  right  component  is  the  next 
individual  of  the  sequence  R.  This  new  pair  goes  under  the 
same  process  as  the  argument  pair  did  and  this  process 
continues  until  the  end  of  the  sequence  is  encountered,  we 
will  give  an  example  to  make  the  operation  clear  to  the 
reader.  Example: 

Suppose  the  sequence  R  (a  relation)  is  defined  as  the 
integers  from  1  to  9  (which  has  the  tuples  like:  <1,2>, 
<2,3>,  <3,4>,  ..  and  so  on)  and  the  function  f  is  defined  as 
the  addition  (+);  i.e.,  it  takes  a  pair,  adds  up  the  left  and 
right  components  and  returns  the  result.  Suppose  the  initial 
value  is  given  as  0  and  the  first  individual  of  the  sequence 
is  given  as  1.  (In  some  cases  we  may  want  to  begin  with 
another  individual  of  the  sequence  depending  on  the 
application).  So  the  operation  to  be  performed  is: 

(@: (R,+) ) : (0,1) 

The  operation  first  adds  1  to  0  and  looks  up  the  next 
individual  of  the  sequence  which  is  2.  It  constructs  the 
pair  (1,2)  and  applies  function  f  to  this  pair  again.  It 
takes  the  result  (3),  looks  up  the  next  individual  of  the 
sequence  which  is  3  and  constructs  the  new  pair  (3,3).  it 
continues  in  the  same  manner  until  it  creates  the  pair 


(45,eos),  where  "eoa"  represents  the  end  o£  the  sequence.  At 
this  point  it  returns  the  result  (45) . 

We  define  the  algorithm  of  this  operation  as  follows: 

1.  Get  the  identifiers  of  the  sequence  R  (a  relation) 
and  the  function  f  (a  relation). 

2.  Get  the  arguaent  pair. 

3.  Apply  the  function  f  to  the  arguaent  pair;  take  the 
individual  returned  and  call  it  W. 

4.  Apply  the  sequence  R  to  the  right  coaponent  of  the 
pair;  take  the  individual  returned  and  call  it  Z. 

5.  If  the  individual  obtained  in  step  4  is  the  end-of- 
sequence  mark,  then  do  step  7.  Else  construct  a  pair  in 
which  the  right  coaponent  individual  is  Z  and  the  left 
coaponent  individual  is  W. 

6.  Go  to  step  3  with  the  pair  constructed  in  step  5 
being  the  arguaent  pair. 

7.  Take  the  left  coaponent  of  the  pair  and  return  it. 

Now  we  will  do  the  worst  case  asymptotical  tiae 

complexity  analysis  of  this  algorithm. 

As  can  be  sen  we  perform  two  function  appliction 
operations  for  each  individual  of  the  sequence.  If  the 
sequence  R  and  the  function  f  are  extensionally  represented 
relations ,  then  the  function  application  operations  on  those 
relations  becoae  constant  tiae  operation.  So  we  Bake  a 
constant  number  of  memory  references  for  each  individual  of 


the  sequence.  Under  this  consideration  we  conclude  that  the 
worst  case  asymptotical  tiae  complexity  behaviour  of  this 
algorithm  is  0(n),  where  n  is  the  nuaber  of  individuals  in 
the  sequence. 

Rc:x 

In  the  preprocessing  phase  many  of  the  converse  composite 
relations  such  as  (RjC)c,  (R*S)c,  etc.  reduce  down  to  the 
primitive  converse  relations  such  as  Rc,  Sc,  where  R  and  S 
are  represented  extensionally.  Hence  we  have  to  define  the 
algorithm  for  the  function  application  operation  which  works 
on  the  converse  of  an  extensionally  represented  relation. 
The  algorithm  is  as  follows: 

1.  Hash  to  the  LHT  with  the  argument  individual  under 
the  relation  identifier  in  question.  Find  the  record  of  this 
individual  and  follow  the  pointer  found  in  the  PRRM  field  of 
this  record.  Find  the  RHT  record  of  the  individual  which  is 
in  relation  with  the  argument  individual  under  the  relation 
in  question. 

2.  Return  the  individual  which  is  represented  by  the  RHT 
record  found  in  step  1. 

How  we  will  do  the  worst  case  asymptotical  time 
complexity  analysis  of  this  algorithm. 

It  is  obvious  from  the  algorithm  that  we  make  a  constant 
number  of  memory  references  for  doing  this  operation.  So  the 


algorithm  has  tha  worst  case  asymptotical  time  complsxity 
behaviour  of  0(c) *  as  it  was  in  tha  (R:x)  case. 

Soma  convarsa  composite  operations  can  not  be  reduced  to 
the  primitive  operation  given  above*  and  should  have 
specially  defined  algorithms;  those  algorithms  are  given 
below: 

(R| |S)c:x 

The  algorithm  for  this  composite  relation  is  as  follows: 

1.  Apply  the  relation  "Rca  to  the  left  individual  of  the 
argument  individual  "x*  (which  is  necessarily  a  pair)*  get 
the  individual  returned  and  save  it. 

2.  Apply  the  relation  *8c"  to  the  right  individual  of 
axa,  get  the  individual  returned  and  save  it. 

3.  Construct  a  pair  relation  out  of  the  individuals 
saved  in  steps  1  and  2;  return  the  pointer  to  the  record  of 
this  pair  relation  which  has  been  established  in  the  relation 
table. 

Now  we  will  do  the  worst  case  asymptotical  time 
complexity  analysis  of  this  algorithm. 

If  the  relations  R  and  S  are  extensionally  represented 
relations*  we  make  a  constant  number  of  memory  references  in 
steps  1  and  2.  In  addition  we  know  that  we  construct  a  pair 
relation  by  making  the  constant  number  of  memory  references. 
So  in  the  above  algorithm  we  make  a  constant  number  of  memory 
references  in  order  to  obtain  an  individual  as  a  result. 


Under  the  above  consideration  we  conclude  that  the  algorithm 
has  the  worst  case  asymptotical  time  complexity  behaviour  of 
0(c) . 

(R#S)ctx 

This  algorithm  is  more  expensive  than  the  algorithm  for 
the  (R|jS)c:x,  because  the  intersection  of  the  set  of 
individuals  that  are  in  relation  with  the  left  individual  of 
the  argument  "x”  (which  is  necessarily  a  pair)  under  the 
relation  R,  and  the  set  of  individuals  in  relation  with  the 
right  individual  of  the  argument  ”x"  under  the  relation  S 
should  contain  at  least  one  individual  which  is  to  be 
returned  as  a  result.  So  in  this  case  we  have  to  execute  the 
"Unit  image”  operation  on  the  left  individual  and  on  the 
right  individual  of  the  argument  individual  (pair)  ”x”.  The 
algorithm  is  as  follows: 

1.  Force  the  composite  set: 

(unimg' :R: (left(x) ) )  and  (unimg' :S: (right  (x) ) ) 
once  and  return  the  resulting  individual. 

Now  we  will  do  the  worst  case  asymptotical  time 
complexity  analysis  of  this  algorithm. 

Assume  the  relation  R  and  the  relation  S  are 
extensionally  represented  relations.  Because  in  the  worst 
case  (when  the  unimg':R:left(x)  and  the  unimg':S:right(x)  are 
disjoint)  we  produce  all  the  individuals  of  the  composite 
setsr  each  unimg*  operation  being  exhaustively  forced  makes  a 
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number  of  memory  references  proportional  to  the  cardinality 
of  the  RXM  set  of  the  relation  in  question  (i.e.,  R  or  S). 
Zn  defining  the  algorithms  related  with  each  case  of  function 
"Force"  we  have  shown  that  the  above  expression  (composite 
set)  could  be  executed  in  linear  time,  because  of  the  way  we 
define  the  algorithm  for  the  "Set  intersection"  operation. 
So  the  above  algorithm  also  has  the  worst  case  asymptotical 
time  complexity  behaviour  of  0(n)  where  "n"  is  the  maximum  of 
the  RZN  set  cardinalities  of  the  relations  R  and  S. 

(nonxR)ctx 

The  algorithm  for  this  case  is  as  follows: 

1.  Force  the  composite  set: 

((rim:R)  -  (unimg*  :R:x) ) 

once:  get  the  individual  returned  and  return  it. 

Row  we  will  do  the  worst  case  asymptotical  time 
complexity  analysis  of  this  algorithm. 

Zn  the  worst  case  the  set  (rim:R)  becomes  exactly  equal 
to  the  set  (unimg* :R:x) ,  in  this  case  we  have  to  produce  all 
the  individuals  of  the  set  (rim:R)  in  addition  to  the 
individuals  of  the  set  (unimg':R:x) .  So  we  can  write  the 
complexity  function  of  this  algorithm  as  follows: 

f  *  R*n  ♦  L*n  ♦  C 
where: 

K  *  The  constant  number  of  memory  references  made  for 
obtaining  each  individual  of  the  set  (rim:R). 
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L  *  The  constant  number  of  memory  references  made  for 
obtaining  each  individual  of  the  set,  (unimg' :R:x) . 

C  *  The  constant  number  of  overhead  memory  references. 

n  *  The  cardinality  of  the  RIM  set  of  the  relation  R. 

By  looking  at  the  above  function  we  conclude  that  the  worst 
case  asymptotical  time  complexity  behaviour  of  this  algorithm 
is  0(n)*  where  ”n”  is  the  cardinality  of  the  RIM  set  of  the 
relation. 

The  change  in  the  time  complexity  behaviour  of  some 
algorithms  in  the  case  the  relations  are  restricted  to  be 
injective  will  be  inspected  next. 

If  we  restrict  the  operand  relations  of  some  composite 
relations  to  be  the  injective  relations*  the  function 
application  operation  makes  a  constant  number  of  memory 
references  while  working  on  those  composite  relations. 

Function  application  algorithms  on  those  composite 
relations  are  as  follows: 

(R6S) :x 

1.  Apply  R  to  the  "x",  save  the  individual  returned. 

2.  Apply  S  to  the  *x",  save  the  individual  returned. 

3.  Compare  the  individuals  saved  in  steps  1  and  2;  if 
they  are  the  same*  return  this  individual;  otherwise  call  the 
error  routine. 

Now  we  will  do  the  worst  case  asymptotical  time 
complexity  analysis  of  this  algorithm. 


Ac  we  know  the  function  application  operation  on  the 
extensionally  represented  relations  has  a  constant  tine 
behaviour  so  in  steps  1  and  2  we  make  a  constant  number  of 
memory  references.  Because  step  3  does  only  one  comparison  we 
conclude  that  the  worst  case  asymptotical  time  complexity 
behaviour  of  this  algorithm  is  0(c).  Mote  that  this  is  true 
in  the  case  the  relations  R  and  3  are  extensionally 
represented  relations. 

(R-S) sx 

1.  Do  steps  1  and  2  of  the  algorithm  defined  for  the 
composite  relation  (RftS)  above. 

2.  Compare  the  individuals  saved;  if  they  are  not  the 
same,  return  the  individual  obtained  by  applying  the  relation 
R  to  the  argument  individual  "x”;  otherwise  call  the  error 
routine. 

Mow  we  will  do  the  worst  case  asymptotical  time 
complexity  analysis  of  this  algorithm. 

The  algorithm  has  the  same  asymptotical  time  complexity 
behaviour  as  the  algorithm  defined  for  the  composite  relation 
(RSS),  because  the  same  argument  applies. 

F.  ALTERNATIVE  METHOD  FOR  6EMERATING  INDIVIDUALS  OF 

COMPOS  ITS  SETS 

In  defining  the  algorithms  for  the  cases  of  the  force 
primitive  we  used  the  unimgs  operation  as  a  primitive 
operation.  When  the  unimg:  operation  works  on  an 


extensionally  represented  relation,  we  know  that  the 
algorithm  for  the  unimg:  operation  has  linear  time  complexity 
behaviour.  On  the  other  hand,  if  the  relation  is  a  complex 
compound  relation,  this  primitive  operation  may  coat  us  more, 
for  example,  the  unimg:  operation  on  the  compound  relation  RS 
(where  St  and  S  are  extenaionally  represented  relations)  has  a 
worst  case  asymptotical  time  complexity  behaviour  of  0(n2). 
We  defined  the  algorithms  of  the  cases  of  the  force  primitive 
for  the  operator  pairs  that  must  be  specially  treated  and  we 
had  a  total  of  27  algorithms. 

In  defining  the  alternative  algorithms  for  the  cases  of 
the  force  primitive  we  will  define  an  algorithm  for  each 
operator  rather  than  each  operator  pair.  This  may  seem  to 
the  reader  more  efficient  than  our  previous  method  and  the 
reader  may  naturally  think  that  by  using  this  method  we  will 
reduce  the  number  of  cases  that  we  have  to  define  algorithms 
for,  but  this  is  not  true  in  our  case.  In  defining  the 
algorithms  for  our  interpreter,  we  mentioned  the  concept  of 
generalisation,  and  we  defined  our  algorithms  in  terms  of 
five  primitive  operations.  It  turned  out  that  we  could 
reduce  some  compound  sets  to  the  other  kinds  of  compound  sets 
in  the  preprocessing  phase.  Hence  we  could  express  some 
composite  sets  which  are  constructed  by  using  a  relational 
operation  in  terms  of  the  other  composite  sets  that  are 
constructed  by  ucing  one  or  more  of  the  five  primitive 


relational  operations.  So  we  did  not  have  to  include  the 
algorithms  for  the  operator  pairs  involving  that  kind  of 
relational  operators  that  is  reducible  in  the  preprocessing 
phase.  This  allowed  us  to  reduce  the  number  of  cases  of  the 
"Force”  primitive  to  27. 

In  the  second  method  we  produce  the  tuples  of  the 
compound  relations  and  do  the  primitive  operations  on  these 
tuples  by  defining  an  algorithm  for  each  operator  that 
constructs  a  compound  relation  and  by  defining  an  algorithm 
for  each  primitive  operator  (i.e.,  Is,  unimgs,  lems,  rims) 
which  yields  a  compound  set  when  combined  with  a  compound 
relation.  Because  any  operand  relation  in  a  compound 
relation  may  be  another  compound  relation,  we  have  to  define 
an  algorithm  for  each  of  the  operators  which  constructs  a 
compound  relation  and  which  was  reducible  in  the 
preprocessing  phase  (in  the  case  of  the  previous  method 
used).  In  addition  some  of  the  cases  of  the  "Force” 
primitive  has  to  be  included  in  the  cases  defined  for  the  new 
method,  so  we  will  have  totally  36  algorithms  for  the  36 
cases  defined  for  the  new  method  instead  of  27  cases  defined 
for  the  previous  method.  The  first  question  we  have  to  ask 
ourselves  ist  What  are  the  efficiencies  associated  with  the 
new  method  that  motivate  us  to  investigate  it?  The  important 
efficiency  of  the  new  method  is  no  matter  how  complex  a 
composite  set  is,  the  operation  producing  the  individuals  of 


this  set  has  ths  worst  case  asymptotical  time  complexity 
behaviour  of  0(n2).  We  will  make  this  fact  clear  in  defining 
the  algorithms  for  the  operators.  In  our  previous  method# 
depending  on  the  cost  of  unimg:  operation  on  the  compound 
relation  in  question#  the  cost  of  an  algorithm  defined  for  a 
case  of  the  "Force"  primitive  can  increase  arbitrarily#  but 
in  this  case  the  asymptotical  time  complexity  behaviour  of  an 
algorithm  is  fixed.  On  the  other  hand  all  of  the  algorithms 
defined  for  the  new  method  have  the  worst  case  asymptotical 
time  complexity  behaviour  of  0(n2),  while  we  had  some  0(n) 
algorithms  with  previous  methods#  note  that  these  algorithms 
have  a  linear  time  complexity  behaviour  in  the  case  the 
operand  relations  of  the  compound  sets  are  represented 
extensionally.  Again  the  cost  of  these  algorithms  may 
increase  arbitrarily  depending  on  the  kind  of  compound 
relations  and  the  cost  of  the  unimg:  operation  on  those 
compound  relations. 

In  our  new  method  we  will  define  a  new  primitive 
function#  namely  "Force  2",  which  works  exactly  the  same  as 
the  "Force”  primitive.  The  distinction  is#  it  produces  the 
tuples  of  the  compound  relations  instead  of  the  individuals 
of  compound  sets.  The  state  saving  mechanism  works  almost  in 
the  same  way  as  we  defined  for  the  "Force”  primitive#  but  in 
this  case  we  have  two  pointers  to  be  saved  instead  of  one. 
Bach  case  of  this  primitive  corresponds  to  a  relational 


operation  that  constructs  a  coapound  relation  out  of  the 
operand  relations.  The  "Force  2"  primitive  is  forced  with  a 
coapound  relation  being  the  argument.  Then  the  appropriate 
case  of  the  "Force  2"  primitive  is  invoked  and  the  algorithm 
defined  for  this  case  divides  this  compound  relation  into 
simpler  coapound  relations  and  calls  the  "Force  2"  primitive 
recursively  with  each  of  the  coapound  relations  created  being 
the  argument  (i.e.f  it  calls  the  "Force  2"  primitive  with 
only  one  coapound  relation  at  a  time).  This  process 
continues  until  an  extensionally  represented  relation  is 
forced,  in  which  case  the  algorithm  below  applies: 

1.  Hash  into  the  relation  table  with  the  relation 
identifier;  find  the  record  of  the  relation. 

2.  Follow  the  pointers  found  in  the  PFLM  and  PPRM  fields 
of  this  record  and  find  the  records  of  the  first  left  member 
and  the  first  right  member. 

3.  Put  the  pointer  to  the  first  left  member's  record 
into  the  left  field  of  the  record  structure  to  be  returned 
and  put  the  pointer  to  the  first  right  member's  record  into 
the  right  field  of  the  record  structure  to  be  returned. 
(*  The  tuples  are  returned  to  the  higher  levels  by  using  the 
record  structure  shown  in  Figure  16.  *) 


left 
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Figure  16.  The  conveying  record  structure. 

4.  Take  the  next  integer  from  the  global  count 
associated  with  the  high  level  compound  relation#  advance  the 
pointer  which  is  currently  pointing  at  the  first  right 
member's  record  to  the  next  record  of  the  RIM  set#  and  put 
this  pointer  into  the  right  field  of  the  record  (which  has 
the  above  structure)  which  will  be  saved  in  the  hash  table 
associated  with  the  "Force  2"  primitive.  Put  the  pointer 
pointing  at  the  current  LBM  set  record  into  the  left  field  of 
this  record.  Bash  to  the  hash  table  which  we  will  call  MHASH 
2#  with  the  integer  obtained  above  being  the  identifier#  and 
save  the  above  record  in  this  hash  table  under  this 
identifier.  (*  The  high  level  compound  relation  mentioned 
above  is  analogous  to  the  high  level  compound  set  that  we 
defined  in  the  "Force”  primitive  case.  *) 

5.  if  a  subsequent  force  is  addressed  to  this  relation 
associated  with  the  same  high  level  compound  relation#  take 
the  next  integer  from  the  global  count  associated  with  this 
high  level  compound  relation  (which  should  be  the  same  as  the 
integer  found  in  step  4)#  hash  to  the  MHASH  2  table  with  this 


integer  as  the  identifier  and  return  the  record  found  under 
this  identifier.  Decrement  the  global  count  and  repeat  step 
4. 

6.  if  in  any  force  operation  the  pointer  proceeding  in 
the  RIM  set  of  the  relation  reaches  the  end  of  the  RIM  set 
and  can  not  proceed  further,  reset  this  pointer  to  the 
beginning  of  the  RIM  set  and  advance  the  pointer  pointing  at 
the  current  record  in  the  LEM  set  to  the  next  record  of  the 
LEM  set.  If  the  LEM  set  is  also  exhausted,  return  "nil”  to 
the  caller  (which  is  the  function  Force  2  itself)  and  save 
0's  in  the  left  and  right  fields  of  the  record  associated 
with  this  level  instead  of  pointers. 

Hence  the  case  associated  with  the  extensionally 
represented  relations  returns  the  tuples  of  the  relation  in 
question,  one  at  a  time  as  "Force  2"  is  forced  repeatedly 
with  the  same  relation  identifier  as  the  argument.  If  a 
tuple  does  meet  the  conditions  imposed  by  the  cases  that  are 
involved  in  the  path  of  recursion,  it  is  returned  by  the 
"Force  2"  primitive  as  a  tuple  of  the  high  level  compound 
relation. 

Now  we  will  define  the  algorithms  for  the  cases  of  the 
"Force  2"  primitive.  In  these  algorithms  we  will  refer  to 
the  membership  test  algorithms  that  have  not  been  defined 
yet.  Me  will  define  those  algorithms  later.  Those 
algorithm;  will  be  associated  with  the  operators  that 


construct  compound  relations.  The  membership  test  operations 
on  relations  are  done  in  exactly  the  same  way  that  we 
explained  in  the  compound  sets  case;  i.e.,  given  a  relation 
and  a  tuple  to  be  tested,  the  algorithms  defined  for  the 
operators  divide  the  membership  test  task  into  simpler 
membership  test  tasks  by  calling  each  other  until  the 
membership  test/tests  can  be  done  on  an  extensionally 
represented  relation,  as  we  described  in  chapter  1.  The 
membership  test  can  be  done  in  constant  time  on  an 
extensionally  represented  relation.  Because  many  of  the 
membership  test  algorithms  divide  the  membership  test  task 
into  simpler  membership  test  tasks  in  constant  time,  no 
matter  how  complex  the  initial  compound  relation  is,  the 
membership  test  operation  can  be  done  in  constant  time  on 
most  of  the  compound  relations.  We  will  explain  .Nome 
algorithms  as  if  "Force  2"  is  being  forced  iftpeatCKl'Uy  but  for 
others  we  will  specify  the  action  for  only  one  force 
operation,  in  order  to  make  the  algorithms  clear  to  the 
reader . 

R&S : 

The  algorithm  for  this  case  can  be  defined  as  follows: 

1.  Force  the  relation  R,  get  the  tuple  returned. 

2.  Test  if  this  tuple  in  relation  S;  if  so  return  this 


tuple,  else  go  to  step  1 


Mow  we  will  do  the  worst  case  asymptotical  time 


complexity  analysis  of  this  algorithm. 

We  know  that  in  the  worst  case  a  relation  may  have  at 
most  (n2)  tuples  when  "nN  is  the  common  cardinality  of  the 
domain  and  codomain  of  the  relation.  So  producing  all  the 
tuples  of  a  relation  requires  at  least  a  number  of  memory 
references  proportional  to  the  square  of  n,  independent  of 
the  underlying  representation  technique.  So,  because  the 
above  algorithm  produces  all  the  tuples  of  one  of  the  operand 
relations  (possibly  a  compound  relation)  it  requires  (at 
least)  a  number  of  memory  references  proportional  to  the 
square  of  ”na. 

Now  suppose  R  is  a  compound  relation.  If  we  can  produce 
the  tuples  of  this  compound  relation  in  0(n2)  time,  we  would 
not  care  about  whether  R  is  a  compound  relation  or  an 
extensionally  represented  primitive  relation  in  deciding  the 
asymptotical  time  complexity  of  R (S,  because  in  either  case 
we  are  producing  the  tuples  in  0(n2)  time.  Let's  suppose 
that  R  is  defined  as  T&D,  where  T  and  o  are  compound 
relations.  If  we  can  produce  the  tuples  of  T  in  0(n2)  time, 
we  would  not  care  about  whether  T  is  an  extensionally 
represented  relation  or  a  compound  relation.  So  if  we  can 
produce  the  tuples  of  each  kind  of  compound  relations  (i.e., 
R-S,  R6S,  R#S,  and  so  on)  in  0(n2)  time,  no  matter  how 
complex  the  initial  compound  relation  is,  we  can  produce  its 
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tuples  by  making  a  number  of  memory  references  proportional 
to  the  square  of  "n”.  Of  course  after  a  certain  number  of 
nesting  levels  and  in  some  instances,  the  value  corresponding 
to  the  constant  multiple  of  the  square  of  ”n”  may  be  much 
larger  than  the  value  of  the  cube  of  Nn”,  but  the 
asymptotical  time  complexity  behaviour  of  the  operation  on 
the  initial  compound  set,  is  still  0(n2). 

In  deciding  about  the  asymptotical  time  complexity  of  the 
above  algorithm  we  assumed  that  the  membership  test  operation 
(in  step  2)  can  be  done  in  constant  time,  but  this  is  not 
always  true.  In  these  algorithms  we  will  continue  to  assume 
the  membership  test  operation  as  a  constant  time  operation. 
There  are  some  compound  relations  for  which  we  can  not  define 
0(n2)  algorithms  for  producing  their  individuals,  such  as 
fan:R  and  san:R.  We  will  discuss  the  effects  of  these 
drawbacks  later. 
r|s 

The  algorithm  for  this  case  can  be  defined  as  follows: 

1.  Force  the  relation  R,  get  the  tuple  returned. 

2.  Test  if  this  tuple  is  in  relation  S.  If  so  go  to 
step  1  else  return  this  tuple. 

3.  After  no  more  tuple  remains  to  be  returned  from  the 
relation  R  as  a  result  of  the  repeated  force  operations, 
force  the  relation  S,  and  return  the  tuples  of  S  one  at  a 
time  as  the  compound  relation  R|S  is  forced. 


Now  we  will  do  the  worst  case  asyaptotical  time 
complexity  analysis  of  this  algorithm. 

The  same  tiae  complexity  analysis  that  we  have  done  in 
the  coapound  relation  R6S  case  applies  to  this  compound 
relation  also. 

R-S 

The  algorithm  for  this  case  can  be  defined  as  follows: 

1.  Force  the  relation  R,  get  the  tuple  returned. 

2.  Test  if  this  tuple  in  relation  S*  if  so  go  to  step  1 
else  return  this  tuple. 

Now  we  will  do  the  worst  case  asyaptotical  tiae 
complexity  analysis  of  this  algorithm. 

The  same  tiae  coaplexity  analysis  that  we  have  done  in 
the  coapound  relation  R&S  case  applies  to  this  coapound 
relation  also. 

alls 

The  algorithm  for  this  case  is  a  very  expensive 
algorithm.  We  will  only  explain  how  expensive  it  is  and  why 
it  is  expensive. 

Producing  all  tuples  of  this  coapound  relation  implies 
producing  all  individuals  of  the  Right  Members  set.  As  we 
know,  the  Right  Meabers  Set  of  the  coapound  relation  r||S  is 
equal  to  the  cartesian  product  of  the  RIM  set  of  R  and  The 
RIM  set  of  S.  As  we  explained  before,  obtaining  the 
individuals  of  a  composite  set  (in  the  new  system)  requires  a 
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number  of  memory  references  proportional  to  the  square  of  n, 
where  n  is  assumed  to  be  the  common  cardinality  of  the  domain 
and  codomain  of  the  relation  in  question.  So  we  obtain  the 
individuals  of  rimtR  in  0(n2)  time  and  for  each  individual 
obtained  we  produce  all  the  individuals  of  rim:S  in  0(n2) 
time.  So  assuming  the  cardinalities  of  the  RIM  sets  and  LEM 
sets  of  the  relations  R  and  S  are  equal  to  n  we  conclude  that 
we  produce  the  individuals  of  rim:(R||S)  by  making  the  number 
of  memory  references  proportional  to  (n4).  In  this  algorithm 
we  create  a  pair  relation  out  of  each  pair  of  individuals 
obtained  in  the  manner  explained  above  (by  producing  the 
individuals  of  rimtR  and  rimtS).  Me  apply  the  relation  R  to 
the  right  component  of  this  individual  (which  is  a  pair)  and 
we  apply  relation  S  to  the  right  component  of  this 
individual,  then  we  pair  the  resulting  individuals  up.  Mote 
that  this  function  application  operation  may  cost  us  0(n) 
time  which  may  make  the  algorithm's  time  complexity  behaviour 
0(n5) . 

(R*S) 

The  algorithm  for  this  case  can  be  defined  as  follows: 

1.  Force  the  rim:R,  get  the  individual  returned. 

2.  Apply  the  relation  R  and  relation  S  to  this 
individual,  pair  the  resulting  individuals  up  (i.e., 
establish  a  record  for  pairs  in  the  relation  table),  and  put 
the  pointer  to  the  record  of  this  pair  in  the  left  field  of 
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the  record  to  be  returned.  In  the  ease  Banner,  establish  the 
pointer  (in  the  right  field  of  the  record  to  be  returned)  to 
the  neaory  location  where  the  individual  obtained  in  step  1 

is  saved. 

3.  Return  the  record  (tuple)  obtained  in  step  2  to  the 
caller . 

Mow  we  will  do  the  worst  case  asymptotical  tine 
complexity  analysis  of  this  algorithm. 

As  we  stated  before,  in  most  of  the  cases,  we  can  obtain 
the  individuals  of  the  RIM  set  of  a  relation  in  0(n2)  time. 
In  the  above  algorithm,  for  each  individual  we  obtained,  we 
call  the  function  application  algorithm  two  times.  As  we 
know,  most  of  the  function  application  algorithms  have  linear 
time  complexity  behaviour.  Assuming  the  function  application 
algorithms  (in  this  case)  have  a  linear  time  complexity 
behaviour  and  the  cardinality  of  the  rim*R,  lemsR  and  lem:S 
are  each  equal  to  n,  we  conclude  that  the  algorithm  has  the 
asymptotical  time  complexity  behaviour  of  0(n3).  Note  that 
some  of  our  function  application  algorithms  have  used  the 
unimgs  operation  on  compound  relations.  That  means  this 
algorithm  will  have  quadratic  behaviour  in  the  new  system,  so 
in  the  worst  case  we  call  a  quadratic  algorithm  for  each 
individual  of  the  rimsR  obtained.  So  the  worst  case 
asymptotical  time  complexity  behaviour  of  the  algorithm  is 
in  fact  0(n*),  which  is  again  an  expensive  algorithm. 


The  algorithm  for  thia  ease  can  be  defined  as  follows: 

1*  Force  the  relation  R#  get  the  tuple  returned. 

2.  Switch  the  components  of  this  tuple  and  return  the 
resulting  tuple.  (*  That  means  put  the  pointer  to  the  right 
component  of  the  tuple  (which  belongs  to  R)  into  the  left 
field  and  left  component  of  the  tuple  into  the  right  field  of 
the  record  to  be  returned.  *) 

Now  we  will  do  the  worst  case  asymptotical  time 
complexity  analysis  of  this  algorithm. 

As  we  can  see*  the  algorithm  does  not  do  more  than  two 
assignments#  so  it  has  the  worst  case  asymptotical  time 
complexity  behaviour  of  0(c). 

RS 

The  algorithm  for  this  case  can  be  defined  as  follows: 

1.  Force  the  relation  S#  get  the  tuple  returned. 

2.  Apply  the  relation  R  to  the  left  component  of  this 
tuple#  take  the  resulting  individual. 

3.  Put  the  pointer  to  this  individual  (actually  the 
memory  location  where  this  individual  is  saved)  in  the  left 
field  and  the  pointer  to  the  right  component  (individual)  of 
the  tuple  obtained  in  step  1  into  the  right  field  of  the 
record  to  be  returned. 

Now  we  will  do  the  worst  case  asymptotical  time 
complexity  analysis  of  this  algorithm. 


A m  we  know,  if  the  above  algorithm  is  forced  repeatedly, 
step  1  costs  us  a  number  of  memory  references  proportional  to 
the  square  of  n,  where  n  is  assumed  to  be  the  common 
cardinality  of  the  rim:S  and  lemtS.  For  each  tuple  obtained 
in  step  1,  we  call  the  functional  application  algorithm. 
Assuming  the  function  application  algorithm  has  the  linear 
time  complexity  behaviour  and  the  lem:R  has  the  cardinality 
n,  we  conclude  that  the  algorithm  has  the  time  complexity 
behaviour  of  0(n3).  If  the  function  application  algorithm  in 
question  has  the  constant  time  complexity  behaviour,  the 
algorithm  would  have  the  time  complexity  behaviour  of  o(n2). 
Note  that  in  our  new  system,  the  function  application 
algorithms  using  the  unimg:  operation  on  compound  relations, 
will  automatically  have  quadratic  behaviour.  This  in  fact 
causes  the  worst  case  asymptotical  time  complexity  behaviour 
of  this  algorithm  to  be  0(n*). 

Now  we  will  define  the  algorithms  that  will  continue 
to  be  maintained  in  our  Force  primitive  and  are  associated 
with  our  five  basic  operations, 
unimg :R:x 

The  algorithm  for  this  case  can  be  defined  as  follows: 

1.  Force  the  Force  2  primitive  with  relation  R  being  the 
argument;  get  the  tuple  returned. 


2.  If  the  right  component  of  this  tuple  is  equal  to  the 
individual  x,  take  the  left  component  of  this  tuple  and 
return  it. 

Now  we  will  do  the  worst  case  asymptotical  time 
complexity  analysis  of  this  algorithm. 

As  can  be  seen  the  algorithm  makes  a  comparison  (taking  a 
constant  number  of  memory  references)  for  each  tuple  obtained 
and  we  know  that  the  tuples  of  a  relation  are  produced  by 
making  a  number  of  memory  references  proportional  to  at  least 
the  square  of  n,  where  n  is  assumed  to  be  the  common 
cardinality  of  the  rim:R  and  lem:R.  So  the  time  complexity 
behaviour  of  the  algorithm  is  at  least  0(n2). 

RltC 

The  algorithm  for  this  case  can  be  defined  as  follows: 

1.  Take  the  next  integer  from  the  global  count. 

2.  Force  the  set  C  repeatedly.  For  each  individual 
obtained  in  this  manner,  hash  to  the  LHT  using  this 
individual  with  the  integer  obtained  in  step  1  as  the 
relation  identifier.  Establish  the  record  of  this  individual 
in  the  LET.  Link  the  records  created  in  this  manner  to  each 
other  as  they  are  created. 

3.  Force  the  relation  R  repeatedly.  For  each  tuple 
obtained,  extract  the  right  component  individual  and  hash 
with  this  individual  into  the  LHT  using  the  integer  obtained 
in  step  1  as  the  relation  identifier.  If  there  exists  a 


rscord  for  this  individual  in  ths  LHT,  do  stop  4;  else  fores 
ths  rslation  R  in  ordsr  to  obtain  ths  nsxt  tupls. 

4.  Sxtract  ths  lsft  coaponsnt  individual  of  ths  currsnt 
tupls,  hash  with  this  individual  into  ths  RBT  with  ths 
intsgsr  obtainsd  in  stsp  1  as  ths  rslation  idsntifisr,  and 
sstablish  ths  rscord  of  this  individual  in  ths  RBT  undsr  this 
rslation  idsntifisr  if  thsrs  is  no  rscord  for  this  individual 
in  ths  RBT.  Link  ths  rscords  ersatsd  in  this  aannsr  to  sach 
othsr  as  thsy  ars  ersatsd. 

5.  Aftsr  no  mors  tuplss  rsaain  to  bs  producsd  in  ths 
rslation  R,  rsturn  ths  first  individual  of  ths  sst 
constructsd  in  ths  RBT  (in  stsp  4).  In  ths  rspsatsd  fores 
opsrations,  rsturn  ths  individuals  of  this  sst  ons  at  a  tins. 

Bow  ws  will  do  ths  worst  cass  asymptotical  tias 
coaplsxity  analysis  of  this  algorithm. 

Ws  ean  writs  ths  tias  coaplsxity  function  of  this 
algor itha  as  follows s 

f  «  K*(n2)  ♦  L*(n2)  ♦  M*n  +  C 
whsrss 

K  -  Ths  constant  nuabsr  of  asaory  rsfsrsncss  aads  in  sach 
itsration  of  stsp  2. 

L  ■  Ths  constant  nuabsr  of  asaory  rsfsrsncss  aads  in 
ordsr  to  obtain  sach  tupls  of  ths  rslation  R  in  stsp 
3. 
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M  ■  The  constant  number  of  meaory  references  aade  in  step 
5  in  order  to  return  each  individual  of  the  set 
obtained  (in  step  4) . 

C  ■  The  constant  nuaber  of  aeaory  references  aade  by  the 
remaining  steps. 

n  -  The  common  cardinality  (assumed)  of  the  LEM  and  RIM 
sets  of  all  relations  involved.  (If  the  cob  pound  set 
C  is  defined  in  teras  of  relations  and  relational 
operations#  the  LEM  and  RIM  set  cardinalities  of 
these  relations  are  also  equal  to  n) . 

In  the  above  algoritha  we  assuaed  that  the  individuals  of  the 
arguaent  set  C  are  producible  by  making  a  nuaber  of  memory 
references  proportional  to  the  square  of  n.  Because  the 
arguaent  set  C  aay  be  a  compound  set#  we  choose  the  typical 
coaplexity  behaviour  of  the  operation  producing  the 
individuals  of  C  as  0(n2).  In  the  above  coaplexity  function# 
the  first  term  corresponds  to  step  2#  the  second  term 
corresponds  to  step  3#  the  third  term  corresponds  to  step  5 
and  the  last  term  corresponds  to  the  remaining  steps.  As  we 
can  see#  the  algorithm  produces  the  resulting  set  in  the  RHT 
when  it  is  forced  for  the  first  time#  and  in  step  5  we  are 
returning  the  individuals  of  an  extensionally  represented  set 
one  at  a  tiae#  so  the  tera  corresponding  to  step  5  has  linear 
behaviour.  We  determine  the  asymptotical  time  complexity 
behaviour  of  this  algorithm  by  looking  at  the  term  of  the 


complexity  function  which  has  the  highest  exponent;  hence  we 
get  0(n2).  if  it  turned  out  that  we  produce  the  individuals 
of  the  set  C  or  the  tuples  of  the  relation  R  in  time 
proportional  to  (n*) ,  the  algorithm  automatically  becomes  an 
0(n3)  algorithm. 
lemsR 

The  algorithm  for  this  case  can  be  defined  as  follows: 

1.  Force  the  relation  R,  get  the  tuple  returned. 

2.  Take  the  left  component  individual  of  the  tuple  and 
return  it. 

Now  we  will  do  the  worst  case  asymptotical  time 
complexity  analysis  of  this  algorithm. 

The  asymptotical  time  complexity  behaviour  of  this 
algorithm  is  the  same  as  the  asymptotical  time  complexity 
behaviour  of  the  operation  which  produces  the  tuples  of  the 
relation  R;  i.e.,  if  we  are  producing  the  tuples  of  the 
compound  relation  R  in  time  proportional  to  the  square  of  nr 
the  time  complexity  behaviour  of  this  algorithm  becomes 
0(n2) . 
rim:R 

The  algorithm  for  this  case  is  very  similar  to  the 


algorithm  for  the  lem:R;  the  only  difference  is  we  take  the 
right  component  individual  instead  of  the  left  component 
individual  in  step  2. 


G.  ADDITIONAL  MEMBERSHIP  TEST  ALGORITHMS  FOR  THE  ALTERNATIVE 

METHOD 

In  this  section  we  will  define  the  algorithms  for  the 
membership  test  corresponding  to  the  relational  operators 
that  construct  compound  relations.  Because  a  relation  can  be 
viewed  as  a  set  of  tuples  and  the  tuples  can  be  viewed  as  the 
members  of  this  setf  we  will  continue  to  use  the  term 
"membership  test”. 

RfcS 

The  algorithm  for  this  case  can  be  defined  as  follows: 

1.  Test  if  the  given  tuple  is  in  relation  R;  if  so  do 
step  2  else  return  false. 

2.  Test  if  the  given  tuple  is  in  relation  S;  if  so 
return  true  else  return  false. 

Now  we  will  do  the  worst  case  asymptotical  time 
complexity  analysis  of  this  alogrithm. 

If  we  assume  the  relations  R  and  s  are  extensionally 
represented  relations,  then  step  1  costs  us  a  constant  number 
of  memory  references,  and  similarly  for  the  second  step.  So 
in  this  case  the  algorithm  has  the  worst  case  asymptotical 
time  complexity  behaviour  of  0(c).  Now  suppose  the  relation 
R  is  a  compound  relation  defined  as  T&H,  where  T  and  H  are 
extensionally  represented  relations.  Since  the  membership 
test  operation  on  this  compound  set  also  requires  the 
constant  number  of  memory  references  to  be  made,  we  can  view 


T&H  as  an  extensionally  represented  relation  for  just  this 
purpose.  So  if  we  can  define  a  constant  time  algorithm  for 
each  kind  of  compound  relation  (i.e.,  R&S,  R-S,  Rc ,  R |  S, 
etc.)  no  matter  how  complex  the  initial  compound  relation  is 
we  can  do  the  membership  test  in  constant  time.  But  not  all 
kinds  of  compound  relations  can  be  associated  with  constant 
time  membership  test  algorithms;  we  will  discuss  the  effects 
of  this  inefficiency  later. 

R|S 

The  algorithm  for  this  case  can  be  defined  as  follows: 

1.  Test  if  the  given  tuple  is  in  relation  R;  if  so 
return  true,  else  do  step  2. 

2.  Test  if  the  given  tuple  is  in  relation  S;  if  so 
return  true. 

Now  we  will  do  the  worst  case  asymptotical  time 
complexity  analysis  of  this  alogrithm. 

The  argument  that  we  have  done  in  the  compound  relation 
R&S  case  applies  to  this  compound  relation  also;  i.e.,  the 
algorithm  has  the  asymptotical  time  complexity  behaviour  of 
0(c)  . 

R-S 

The  algorithm  for  this  case  can  be  defined  as  follows: 

1.  Test  if  the  given  tuple  is  in  relation  R;  if  so  do 
step  2,  else  return  false. 


2.  Test  if  the  given  tuple  is  in  relation  S;  if  so 
return  falser  else  return  true. 

Now  we  will  do  the  worst  case  asymptotical  time 
complexity  analysis  of  this  alogrithm. 

The  argument  that  we  have  done  in  the  compound  relation 
RSS  case  applies  to  this  compound  relation  also;  i.e.,  the 
algorithm  has  the  asymptotical  time  complexity  behaviour  of 
0(c) . 

Rc 

We  can  define  this  algorithm  as  follows: 

1.  Switch  the  component  individuals  of  the  given  tuple, 
test  if  the  resulting  tuple  is  in  relation  R;  if  so  return 
true  else  return  false. 

Now  we  will  do  the  worst  case  asymptotical  time 
complexity  analysis  of  this  alogrithm. 

The  argument  that  we  have  done  in  the  compound  relation 
RSS  case  applies  to  this  compound  relation  also;  i.e.,  the 
algorithm  has  the  asymptotical  time  complexity  behaviour  of 
0(c)  . 
non:R 

We  can  define  this  algorithm  as  follows: 

1.  Test  if  the  given  tuple  is  in  relation  R;  if  so 
return  false  else  return  true. 

Now  we  will  do  the  worst  case  asymptotical  time 
complexity  analysis  of  this  alogrithm. 


The  time  complexity  behaviour  of  the  algorithm  is  the 
same  as  the  time  complexity  behaviour  of  the  algorithms  given 
above  and  the  same  argument  applies. 

alls 

The  algorithm  for  this  case  can  be  defined  as  follows: 

Let  the  given  tuple  be  <(a,c),  (b,g)>  where  (a,c)  and 

(b,g)  are  pairs  (individuals). 

1.  Test  if  the  tuple  <a,b>  is  in  relation  R;  if  so  do 
step  2  else  return  false. 

2.  Test  if  the  tuple  <c,g»  is  in  relation  S;  if  so 
return  true  else  return  false. 

Mow  we  will  do  the  worst  case  asymptotical  time 
complexity  analysis  of  this  alogrithm. 

As  can  be  seen  the  algorithm  makes  two  membership  tests. 
If  both  membership  test  operations  have  the  constant  time 
asymptotical  time  complexity  behaviour  then  the  algorithm  has 
the  asymptotical  time  complexity  behavior  of  0(c).  If  any 
one  of  the  membership  test  operations  has  a  time  complexity 
function  which  dominates  the  constant  function,  then  the 
algorithm  has  the  same  asymptotical  time  complexity  behaviour 
as  the  time  complexity  behaviour  of  this  operation. 

R#S 

Me  can  define  this  algorithm  as  follows: 

Let  the  given  tuple  be  <(a,c),b>  where  (a,c)  is  a  pair 
(individual) . 
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1.  Test  if  the  tuple  <a,b>  is  in  the  relation  R;  if  so 
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do  step  2  else  return  false. 

2.  Test  if  the  tuple  <c,b>  is  in  the  relation  S;  if  so 
return  true. 

Mow  we  will  do  the  worst  case  asymptotical  time 
complexity  analysis  of  this  algorithm. 

The  algorithm  has  the  same  asymptotical  time  complexity 
behaviour  as  the  asymptotical  time  complexity  behaviour  of 
the  algorithm  defined  for  R{|s. 

RS 

In  this  algorithm  we  have  to  use  the  unimg':  operation  in 
order  to  determine  if  the  given  tuple  is  in  the  relation  RS, 
because  the  left  component  of  the  given  tuple  should  be  a 
member  of  the  LEM  set  of  the  relation  R  and  the  right 
component  of  the  tuple  should  be  the  member  of  the  RIM  set  of 
the  relation  S.  Hence  we  can  not  easily  determine  if  the 
given  tuple  is  in  the  relation  RS,  especially  in  case  the 
relation  R  and  S  are  themselves  compound  relations.  The 
algorithm  is  as  follows: 

1.  Test  if  the  right  component  individual  of  the  given 
tuple  is  in  the  rim:S?  if  so  do  step  2  else  return  false. 

2.  Force  the  composite  set  unimg:R:x  repeatedly,  store 


the  individuals  of  this  set  in  the  RHT  as  it  has  been  done 
before  in  previous  algorithms. 


tr* 


3.  Force  the  unimg':R:y  repeatedly  where  y  is  the  left 
component  individual  of  the  given  tuple.  For  each  individual 
produced,  test  if  this  individual  in  the  set  established  in 
the  RHT  above;  if  so  return  true,  else  continue  testing  the 
next  individual  of  the  unimg':R:y. 

4.  If  neither  of  the  individuals  of  the  uniag':R:y  is  in 
the  set  established  in  the  RHT,  return  false. 

Now  we  will  do  the  worst  case  asymptotical  tiae 
complexity  analysis  of  this  alogritha. 

As  can  be  seen  the  algorithm  is  an  expensive  algorithm 
relative  to  the  other  membership  test  algorithms.  We  know 
that  producing  the  individuals  of  the  uniag:R:x  and  the 
unimg':R:y  both  requires  a  number  of  memory  references 
proportional  to  the  square  of  n,  where  n  is  assumed  to  be  the 
common  cardinality  of  the  LEM  and  the  RIM  sets  of  the 
relation  R.  So  the  algorithm  has  to  make  at  least  (n2) 
memory  references.  We  conclude  that  the  algorithm  has  the 
asymptotical  time  complexity  behaviour  of  0(n2).  Of  course, 
if  the  complexity  of  the  operations  unimg:R:x  and  unimg':R:y 
were  higher,  the  complexity  of  this  algorithm  would  increase. 

The  membership  test  algorithms  on  the  compound  relations 
fantR  and  san:R  are  much  more  costly  than  the  above 
algorithm;  hence  we  will  construct  the  extensional 
representation  structure  (in  less  time)  for  these  compound 
relations  in  order  to  be  able  to  do  the  membership  test.  If 
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we  also  construct  the  extensional  representation  structure 
Cor  the  compound  relation  RS  whenever  a  reference  to  this 
compound  relation  occurs  for  the  first  time#  we  can  assert 
that:  Given  that  the  compound  relations  fan:R,  san:R  and  RS 
are  represented  extensionally,  no  matter  how  complex  the  high 
level  compound  relation  is,  the  membership  test  on  this 
compound  relation  can  be  done  in  constant  time. 

H.  COMPARISON  OF  THE  TWO  METHODS 

We  stated  that,  in  the  first  method,  the  algorithms  may 
get  arbitrarily  expensive  depending  on  the  complexity  of  the 
compound  relation  or  set  in  question.  On  the  other  hand,  in 
the  second  method,  if  we  construct  the  extensional 
representation  for  the  kind  of  compound  relations  which  are 
associated  with  the  expensive  production  (Force  2  cases)  or 
membership  algorithms,  we  can  do  the  operations  in  0(n2)  time 
independent  of  the  complexity  of  the  compound  relation  or 
compound  set  in  question.  Let's  now  assume  that  we  did  not 
construct  the  extensional  representations  for  compound 
relations,  R#S,  R||S,  fan:R  and  san:R,  and  let's  assume  the 
relations  R  and  S  are  represented  extensionally.  In  this 
case  the  operation  (R#S)i:C  has  the  asymptotical  time 
complexity  behaviour  of  0(n)  by  using  the  first  method  and  if 
we  use  the  second  method  it  has  the  asymptotical  time 
complexity  behaviour  of  0(n4).  The  Table  1  shows  the 
differences  in  asymptotical  time  complexity  between  the  two 


methods.  In  this  table  we  also  show  the  differences  in 
asymptotical  time  complexity  in  the  case  when  we  represent 
some  of  the  compound  relations  extensionally  (in  the  second 
method) . 

We  should  not  consider  the  second  method  more  efficient 
than  the  first  method  because  when  we  produce  the  individuals 
of  a  compound  set  by  making  no  less  than  (n2)  memory 
references#  our  function  application  algorithms  and  set 
membership  test  algorithms  become  automatically  0(n2) 
algorithms#  while  they  were  0(n)  algorithms  in  the  case 
method  1  was  used.  Again  we  can  not  forget  that  those  linear 
function  application  and  set  membership  test  algorithms  exist 
in  the  case  the  operand  relations  of  the  compound  relations 
are  represented  extensionally.  The  cost  of  those  algorithms 
may  go  up  depending  on  the  complexity  of  compound  relation  or 
set  in  question.  Note  that  we  can  reduce  the  overall 
complexity  of  the  operations  in  the  first  method  by 
representing  some  kind  of  compound  relations#  like  R-S#  RS# 
fan:R#  etc.  extensionally  like  we  have  done  in  the  second 
method.  As  a  criterion  we  can  say  that#  if  the  nesting 
levels  in  our  compc  and  relations  do  not  exceed  1  or  2#  the 
first  method  should  be  used;  otherwise  the  second  method  is 
more  appropriate. 


I.  THE  WAY  THE  SYSTB!  HANDLES  THE  RESTRICTION  OPERATIONS 
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The  restriction  operations,  "Right  restriction”,  "Left 
restriction”,  and  the  "Restriction",  are  handled  by  the 
system  in  a  special  manner.  The  operations  that  we  defined 
before  are  done  in  exactly  the  same  manner  as  defined  on  the 
restricted  relations  also,  but  any  individual  obtained  from 
the  LEM  set  of  a  left  restricted  or  restricted  relation  is 
tested  for  membership  in  the  set  to  which  the  LEM  set  of  the 
relation  is  restricted.  In  the  same  manner  any  individual 
obtained  from  the  RIM  set  of  a  right  restricted  or  restricted 
relation  is  tested  for  membership  in  the  set  to  which  the  RIM 
set  of  this  relation  is  restricted.  Hence  the  system  treats 
the  restriction  operations  as  general  operations. 

This  feature  of  the  system  can  be  implemented  by  defining 
a  separate  routine  which  is  given  a  set  (possibly  a  composite 
set  represented  in  character  string  form)  and  an  individual 
drives  the  appropriate  routines  defined  for  the  membership 
test  in  order  to  test  the  membership  of  this  individual  in 
the  given  set.  This  routine  can  be  called  by  any  operation 
that  has  just  obtained  an  individual  from  the  LEM  or  RIM  set 
of  a  left  restricted  and/or  right  restricted  relation,  so 
this  operation  waits  for  positive  response  from  this  routine 
and  upon  getting  the  positive  response  (true),  the  operation 
may  do  whatever  it  intended  to  do  with  the  individual  in 
question. 
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IV 


PPRB  INTBNSIOKAL  REPRESENTATION  SYSTEM 


We  defined  the  priaitive  relations  as  the  relations 
defined  by  the  user,  which  aay  be  represented  either 
extensionally  or  intensionally.  In  the  saae  aanner  we 
defined  the  priaitive  sets  as  the  sets  defined  by  the  user, 
which  aay  be  represented  either  extensionally  or 
intensionally.  The  systea  we  defined  assuaed  that  the 
priaitive  relations  and  sets  are  represented  extensionally. 
Now  we  will  think  about  how  the  systea  can  be  adapted  to  the 
case  in  which  the  priaitive  relations  and  sets  are 
represented  intensionally;  in  other  words  when  we  have  an 
expression  representing  a  priaitive  relation  or  a  set  rather 
than  a  data  structure. 

In  defining  the  algor ithas  for  our  systea,  we  focused  on 
three  aain  groups  of  algor ithas,  naaely: 

1.  The  algor  ithas  for  the  production  of  the  individuals 
of  the  intensionally  represented  composite  sets. 

2.  The  algorithas  that  do  the  aeabership  test  on  the 
intensionally  represented  composite  sets. 

3.  The  algorithas  for  the  function  application  operation 
on  each  kind  of  intensionally  represented  composite  relation. 


If*  will  focus  on  thes*  thee*  groups  of  algorithms  in  the  case 
the  primitive  relations  and  sets  are  represented 
intensionally. 

In  defining  the  mechanism  for  the  production  of  the 
individuals  of  the  intensionally  represented  composite  sets, 
we  designate  five  basic  operations  and  we  reduce  the 
operations  on  those  coaposite  sets  to  these  five  basic 
operations  on  the  extensionally  represented  relations.  So 
if  we  can  define  these  operations  on  the  intensionally 
represented  primitive  relations  we  can  adapt  our  individual 
production  aechanisa  to  the  case  in  which  the  primitive 
relations  are  represented  intensionally.  Because  the  system 
does  not  pay  attention  to  the  way  the  primitive  relations  are 
represented  until  one  of  the  five  basic  operations  is  done  on 
the  primitive  relation,  we  can  adapt  the  system  to  this  case 
by  only  defining  the  algorithms  for  the  five  basic  operations 
on  the  intensionally  represented  primitive  relations.  But 
this  is  not  as  easy  as  it  seems  at  first  glance;  first  of  all 
if  a  relation  is  not  a  function,  we  can  not  easily  define  the 
code  which  represents  this  relation.  On  the  other  hand  if  we 
restrict  our  relations  to  the  functions,  three  of  the  five 
basic  operations.  Unit  image.  Unit  coimage.  Image,  become 
undefined  and  the  remaining  operations.  Left  members  and 
Right  members,  are  hard  to  define  on  the  intensionally 
represented  relation  because  in  some  cases  the  domain  of  a 


relation  may  be  an  infinite  set.  Producing  the  individuals 
of  this  set  brings  the  question,  "How  many  of  the  individuals 
will  be  produced?”  and  having  the  user  declare  the  interval 
of  input  values  (domain  individuals)  for  his  function  is  not 
logical  and  has  not  been  done  in  any  language.  So  the 
mechanism  for  the  production  of  the  individuals  of  the 
composite  sets  when  the  primitive  relations  are  represented 
intensionally,  is  hard  to  define  and  even  if  it  is  defined  it 
brings  many  undesirable  restrictions  to  the  user. 

In  defining  the  above  system  we  could  be  able  to  define 
less  costly  algorithms  for  membership  tests  on  some  kind  of 
composite  sets,  but  for  some  of  them  we  had  to  do  the 
membership  test  in  a  produce  and  test  fashion.  Hence  there 
is  no  point  in  defining  the  whole  mechanism  for  our  new  case 
by  knowing  the  fact  that  we  will  not  be  able  to  define 
algorithms  for  some  composite  sets.  So  the  mechanism  for 
membership  test  is  not  a  well  defined  mechanism  when  the 
primitive  relations  are  intensionally  represented.  The  prime 
reason  which  causes  that  is  the  "Image”  operation  which 
requires  the  argument  set  individuals  explicitly  in  order  to 
accomplish  its  job.  We  know  that  the  mechanism  for  the 
production  of  the  individuals  of  the  intensionally 
represented  composite  sets  is  not  a  well  defined  mechanism  in 
the  case  the  primitive  relations  and  sets  are  represented 
intensionally. 
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The  algorithms  that  £all  into  the  last  category  are  well 
defined  in  the  case  the  primitive  relations  are  represented 
intensionally*  but  these  relations  should  be  restricted  to  be 
the  functions.  So  in  that  mechanism*  the  user  can  define  his 
functions  in  advance  by  using  a  high  level  language  and  by 
compiling  them*  then  he  may  introduce  those  routines  into  our 
system  by  linking  them  to  the  system.  During  this  linking 
operation*  the  records  of  those  relations  (functions)  are 
established  in  the  relation  table  with  the  user  defined 
identifiers  being  the  relation  identifiers  and  pointers  to 
the  related  codes  are  established  in  the  POOLS  fields  of 
those  records.  So  whenever  the  function  application 
operation  is  to  be  performed  on  an  intensionally  represented 
primitive  relation  (user  defined  function)*  the  system  finds 
the  record  of  this  relation;  extracts  the  pointer  to  the  code 
from  the  POOLS  field  of  this  relation's  record  and  calls  the 


function  with  the  argument  individual. 

So  we  have  seen  that  the  structure  of  the  system  allows 
the  user  to  define  functions  and  embed  them  in  the  system* 
but  it  does  not  allow  the  user  to  use  his/her  functions  in 
defining  the  composite  sets.  As  an  example*  suppose  the  user 
defined  a  function  called*  "+"*  which  is  given  an  integer 
returns  the  successor  of  this  integer.  He  may  use  this 
function  in  any  expression  as  long  as  the  operation  to  be 
done  on  this  relation  is  function  application. 
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V.  CONCLUSIONS 


In  this  thesis  we  tried  to  find  out  efficient  ways  to 
represent  binary  relations  that  make  the  algorithms  of  the 
relational  operations  efficient  in  time.  We  did  this  by 
inspecting  the  worst  case  asymptotical  time  complexity 
behaviour  of  the  algorithms. 

The  first  representation  techniques  that  we  inspected 
were  the  extensional  representation  techniques.  Among  them 
we  selected  the  Incidence  Matrix  representation  and  used  it 
in  combination  with  hash  tables;  we  called  the  resulting 
representation  technique  the  Hash-Incidence-Vector 
representation.  This  representation  technique  enabled  us  to 
create  efficient  algorithms  relative  to  the  algorithms 
defined  for  the  Table  representation.  We  have  observed  that 
among  the  25  relational  operation  algorithms  defined  for  the 
Hash-Incidence- Vector  representation,  two  are  constant  time 
algorithms,  twelve  are  0(n)  time  algorithms,  seven  are  0(n2) 
time  algorithms,  three  are  0(n2)  algorithms  and  one  is  0(n4) 
algorithm.  Among  the  0(n2)  time  algorithms,  only  three  have 
been  observed  to  be  very  expensive  because  their  time 
complexity  functions  had  large  constants  in  front  of  the 
second  degree  terms.  The  other  0(n2)  time  algorithms  have 


been  observed  to  be  cheaper ,  because  the  constants  in  front 
of  the  second  degree  terms  of  their  complexity  functions  are 
less  than  1. 

As  a  result,  the  operations.  Relation  Intersection, 
Relation  Difference,  Relation  Union,  Relation  Composition 
(Relative  Product),  Parallel  Application,  First  Ancestral  and 
Second  Ancestral,  have  been  found  to  be  expensive  operations. 
Among  them.  Relation  Intersection,  Relation  Difference, 
Relation  Union  and  Parallel  application  operations  can  be 
associated  with  constant  time  and  0(n)  time  algorithms  in  the 
case  the  intensional  representation  techniques  used.  The 
algorithms  for  the  Relative  Product,  First  Ancestral,  and 
Second  Ancestral  operations,  have  0(n3)  worst  case 
asymptotical  time  complexity  behaviour.  However,  in  their 
time  complexity  functions  the  constants  in  front  of  the  third 
degree  terms  are  less  than  1,  which  makes  these  algorithms 
executable  for  small  n's  (100-200). 

As  can  be  seen,  most  of  the  algorithms  have  a  worst  case 
asymptotical  time  complexity  behaviour  of  0(n).  Why  couldn't 
we  define  more  efficient  algorithms?  The  first  reason  is 
most  of  the  operations  must  examine  all  of  the  individuals  of 
the  sets  involved.  The  second  reason  is  we  want  to  save  the 
original  relations  and  sets  while  constructing  nvw  relations 
and  sets  out  of  the  original  relations  and  sets.  This 
requires  extensive  copying  operations  and  causes  most  of  the 


algorithms  to  have  a  worst  case  asymptotical  time  complexity 
behaviour  of  0(n). 
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In  the  intensional  representations  case ,  we  observed  that 
the  algorithms  become  expensive  in  time  but  that  we  save  a 
lot  of  space  by  not  constructing  the  extensional 
representation  structures  for  the  intermediate  sets  and 
relations  in  the  memory.  On  the  other  hand,  we  have  been 
able  to  define  cheaper  intensional  algorithms  for  some  of  the 
operations  that  are  associated  with  expensive  extensional 
algorithms  (such  as  parallel  application). 

In  Chapter  IV  we  have  seen  that  the  pure  intensional 
representation  mechanism  is  not  a  well  defined  mechanism  but 
we  are  able  to  include  user  defined  functions  if  we  restrict 
the  use  of  those  functions  to  the  function  application 
operation. 

So,  the  extensional  representation  techniques  enable  us 
to  define  time  efficient  algorithms  and  the  intensiona) 
representation  techniques  enable  us  to  define  space  efficient 
algorithms.  For  us,  using  both  representation  techniques  in 
combination  with  each  other  (rather  than  firmly  selecting  one 
of  them)  is  necessary.  If  we  are  to  define  a  criterion  for 
establishing  this  combination,  we  would  use  intensional 
representation  techniques  for  Relation  Intersection,  Relation 
Union,  Relation  Difference  and  Parallel  Application 
operations.  We  would  use  extensional  representation 
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techniques  for  Relative  Product,  Pirst  Ancestral  and  Second 
Ancestral  operations  when  they  are  involved  in  a  composite 
set  construct;  otherwise,  we  would  use  intensional 
representation  techniques  for  these  operations.  The  above 
criterion  cam  be  refined  by  taking  into  account  the  available 
hardware  features.  For  example,  if  we  have  a  limited  memory 
we  would  use  the  intensional  representation  techniques  for 
most  of  the  operations. 

As  can  be  easily  seen,  it  is  feasible  to  implement  the 
language  on  conventional  architectures.  But  it  would  be  nice 
to  have  an  architecture  which  supports  this  language.  This 
architecture  has  to  have  at  least  these  properties: 

1.  It  has  to  support  hash  coding. 

2.  it  has  to  have  pipelining,  or  an  equivalent  mecha¬ 
nism,  which  has  at  least  an  ORing  stage. 

3.  It  has  to  have  a  mechanism  to  speed  up  copying  opera¬ 
tions. 

4.  It  has  to  support  bit  string  and  character  string 
data  types. 

As  we  indicated  before,  the  efficiency  of  most  of  our 
algorithms  increases  as  the  memory  word  length  increases.  So 
as  long  as  we  can  fetch  more  bits  for  each  memory  cycle,  the 
speed  of  our  algorithms  increases  proportionally.  Thus  we 
must  be  careful  about  the  word  length  if  we  use  the  Hash- 
Incidence-Vector  representation. 


We  conclude  that  it  is  feasible  to  implement  this 
language  on  conventional  architectures  and  that  we  can  make 
full  use  of  this  powerful  language  by  having  more  suitable 
architectures. 


APPENDIX  A 


THE  BXTENSIONAL  ALGORITHMS  CONTINUED 


Image  Operation  (Rl:x)  : 

This  operation^  given  a  set  C;  produces  the  set  o£ 
individuals  in  which  each  individual  is  in  relation  with  at 
least  one  individual  in  the  given  set  C  under  the  relation 
that  is  being  applied  to  the  set  C.  This  means  the  operation 
is  effectively  performing  the  "Unit  Image”  operation  on  each 
individual  of  the  given  set  C,  then  performing  the  set  union 
operation  on  the  resulting  sets  to  obtain  the  set  in  which 
each  individual  is  in  relation  with  at  least  one  individual 
of  the  set  C.  He  can  state  this  more  carefully:  Let  R  be  a 
relation  and  C  be  a  set#  then  R!:C  is  the  set  of  all  y  such 
that  yRx  for  some  x  in  C. 

The  algorithm  for  Hash-Incidence-Vector  representation  is 
as  follows: 

1.  Get  the  relation  identifier  and  the  set  identifier. 

2.  Hash  with  the  relation  identifier  to  the  relation 
table  and  find  the  record  of  the  relation;  follow  the 
pointers  found  in  the  PFLM  and  in  the  PFRM  fields  of  that 
record  and  find  the  records  of  the  first  left  member  and  the 
first  right  member  respectively. 

3.  Hash  with  the  set  identifier  to  the  set  table;  find 
the  record  of  the  set;  follow  the  pointer  found  in  the  PSS 
field  of  that  record  and  find  the  first  record  of  the  set 
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structure.  Start  from  the  beginning  of  the  linked  list 
structure  of  the  set  and  proceed  down  in  that  structure 
record  by  record.  For  each  record  found  in  this  manner  hash 
with  the  individual  being  represented  by  that  record  to  the 
RHT  under  the  given  relation  identifier.  If  the  RHT  record 
of  that  individual  is  present  in  the  RIM  set  of  the  relation, 
extract  the  index  of  the  RHT  record  that  represents  that 
integer  and  hash  into  the  RHT  with  that  individual  under  the 
relation  identifier: 

•$$$• 

and  establish  the  RHT  record  of  that  integer.  Link  the 
records  created  in  this  manner  in  the  RHT  by  their  TASE  links 
as  they  are  created. 

4.  Start  from  the  beginning  of  the  LEM  set  of  the 
relation,  proceed  down  in  the  LEM  set  record  by  record.  For 
each  record  found,  extract  the  index  of  the  record.  Put  it 
in  an  index  register  and  increment  it  up  to  the  number: 

INDEX  +  cardinality  of  the  RIM  set  of  the  relation 
by  beginning  with  1.  For  each  increment  hash  into  the  RHT 
with  the  integer: 

CURRENT  VALUE  -  INDEX 

under  the  relation  identifier  "$$$".  If  a  record  for  the 
resulting  integer  is  found  to  be  in  the  RHT,  call  this 
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integer  J,  reference  the  incidence  vector  of  the  relation 
with  the  index 

K  -  J-l 

by  calling  the  "Reference"  algorithm.  If  there  ia  a  1  in  the 
corresponding  incidence  vector  location,  hash  into  the  SHT 
with  the  left  individual'  being  represented  by  the  current  LEM 
set  record  and  establish  the  record  of  that  individual.  Link 
the  SHT  records  created  in  this  manner  to  each  other  as  they 
are  created.  Keep  a  count  and  increment  that  count  for  each 
record  created  in  the  SHT  by  beginning  with  0.  As  soon  as  a 
1  is  found  for  a  left  individual  in  the  above  manner,  quit 
with  that  left  individual  and  continue  to  perform  the  above 
process  for  the  next  individual  of  the  LEM  set,  by  following 
the  TASB  link  of  the  current  left  individual's  record  and 
finding  the  next  individual's  LHT  record  in  the  linked  list 
structure  of  the  LEM  set  of  the  relation. 

5.  Start  from  the  beginning  of  the  linked  list  of 
temporary  records  created  in  the  step  3,  (in  the  RHT) . 
Proceed  down  in  that  linked  list  structure  and  for  each 
record  found,  hash  to  the  RHT  with  the  integer  (index)  being 
represented  by  that  record  under  the  relation  identifier 
"$$$",  and  disconnect  it  from  the  RHT  entry  if  it  is  directly 
connected  to  that  RHT  entry. 


6.  Hash  to  tha  set  table  with  the  set  identifier: 
(relation  identifier) ' (argument  set  identifier) 

This  means,  if  the  relation  being  applied  has  the  identifier 
R  and  the  argument  set  identifier  is  C,  hash  into  the  set 
table  under  the  set  identifier: 

R!  :C 

Establish  the  record  of  that  set,  put  the  pointer  to  the 
linked  list  structure  established  in  SHT  into  the  PSS  field 
of  that  record.  Put  the  last  value  of  the  count  into  the 
CARD  field  of  that  record. 

Now  we  will  do  the  worst  case  asymptotical  time 
complexity  analysis  of  this  algorithm. 

In  the  worst  case  every  left  individual  may  be  in 
relation  with  only  one  right  individual  and  the  record  of 
that  right  individual  may  be  the  last  record  of  the 
relation's  RIM  set.  In  that  case,  in  step  4  we  make: 
ceiling (n/C) 

memory  references,  where: 

C  ■  memory  word  length. 

n  *  The  cardinality  of  the  RIM  set  of  the  relation. 

In  addition  to  that,  even  though  it  is  unlikely,  the  argument 
set  C  may  be  a  super  set  of  the  RIM  set  of  the  relation,  so 
in  step  3  we  effectively  make  a  separate  copy  of  the  RIM  set 
of  the  relation. 


So  under  these  considerations  and  by  assuming  the 
cardinalities  of  the  RIM  set  and  the  LEM  set  of  the  relation 
are  the  same  ("n"),  we  write  the  worst  case  time  complexity 
function  of  that  algorithm  as: 

f  ■  R*n*ceiling (n/C)  ♦  L*n  +  P*n  +  D 
where: 

n  *  The  cardinality  of  the  RIM/LEM  set  of  the  relation. 

K  *  The  number  of  memory  references  made  for  each  left 
individual's  record  found  in  the  step  4. 

m  ■  The  cardinality  of  the  argument  set. 

L  -  The  number  of  memory  references  made  for  each  set 
record  found  in  step  3. 

P  -  The  number  of  memory  references  made  for  each  set 
record  found  in  step  S. 

and: 

First  term  corresponds  to  the  step  4,  second  term  corresponds 
to  the  step  3,  third  term  corresponds  to  the  step  5,  fourth 
term  corresponds  to  the  other  steps  of  the  algorithm.  Let 
n/C  be  an  integer  and  K/C  ■  V,  then  the  complexity  function 
becomes: 

f  *  V*(n2)  +  P*n  +  L*m  ♦  D 

In  this  algorithm  we  significantly  reduce  the  average 
case  complexity  in  step  4.  That  is,  as  soon  as  a  1  for  a 
left  individual  is  found  that  corresponds  to  a  tuple  which 
has  the  right  individual  from  the  given  set  C,  we  quit  with 


that  left  individual  and  continue  with  the  next  one  in  the 
LEM  set.  Especially  in  large  relations  this  nay  decrease  the 
coaplexity  of  the  tern: 

ceiling (n/C) 

by  sons  constant.  But  in  order  to  do  that  we  had  to  stand 
for  sone  linear  terns  in  the  conplexity  function,  and  in  sone 
particular  cases  the  strength  of  one  of  those  linear  terns 
nay  doninate  the  coaplexity  of  the  first  tern. 

But  we  are  concerned  with  the  worst  case  asynptotical 
tine  coaplexity  behaviour  of  that  algorithn.  By  looking  at 
the  exponent  of  the  tern  with  the  larger  exponent  and  by 
assuaing : 

L*n  <  V*(n2) 

we  conclude  that  the  worst  case  asymptotical  tine  conplexity 
behaviour  of  that  algorithn  is  0(n2). 

Now  we  have  to  define  the  algorithm  for  the  table 
representation.  The  algorithn  is  as  follows: 

1.  Start  fron  the  beginning  of  the  linked  list  structure 
of  the  argument  set  and  proceed  down  in  that  linked  list 
record  by  record.  For  each  record  found  in  this  manner, 
search  the  individual  being  represented  by  that  record  in  the 
right  colunn  of  the  relation's  table  by  starting  fron  the 
beginning  of  the  table  and  by  looking  up  the  right  individual 
of  each  record  found  while  proceeding  in  the  table  record  by 
record.  Because  there  is  a  possibility  of  a  duplication  of 


the  individuals  in  the  tight  column,  do  it  exhaustively.  For 
each  record  £ound  to  have  the  individual  in  question  as  right 
individual,  hash  into  the  SHT  with  the  left  individual  o£ 
that  record  under  the  new  set  identifier  (described  in  step  6 
of  the  previous  algorithm),  and  establish  the  record  of  that 
individual  in  the  SBT  if  the  record  of  that  individual  has 
not  been  established  in  the  SHT  previously.  Link  the  records 
created  in  this  manner  to  each  other  by  their  TASE  links  as 
they  are  created.  Keep  a  count  beginning  with  0  and 
increment  it  for  each  record  created  in  the  above  manner. 

(*  Step  2  is  called  the  disconnection  operation.  *) 

2.  Do  step  6  of  the  previous  algorithm. 

Now  *e  will  do  the  worst  case  asymptotical  time 
cosiplexity  analysis  of  this  algorithm. 

As  can  be  seen  the  algorithm  is  a  costly  algorithm;  in 
fact  there  are  some  other  ways  to  do  it  more  efficiently. 
One  way  is  to  convert  the  table  representation  into  an 
adjacency-list-like  representation  and  establish  the  table  in 
the  SCHT.  As  we  mentioned  in  the  storage  requirements 
analysis  for  large  relations,  we  may  use  up  a  large  part  of 
our  memory  source  and,  even  though  it  is  done  temporarily, 
that  may  cause  the  heap  to  get  too  large,  etc.  In  fact  the 
resulting  analysis  would  be  attributed  to  the  adjacency  list 
representation  rather  than  the  table  representation  if  we 
would  have  done  that. 
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We  write  the  worst  case  time  complexity  function  of  that 
algorithm  as  follows: 

f  -  K*m*p  +  C 
where: 

p  *  Relation  size/Table  size. 

In  the  worst  case  the  relation  may  be  the  universal  relation, 
so  by  assuming  the  cardinalities  of  the  LEM  set  and  the  RIM 
set  of  the  relation  are  equal  to,  say,  ”n",  we  may  replace 
the  variable  "p”  with: 
n*n 

So  we  rewrite  the  complexity  function  as: 

P  -  K*m*(n2)  +  D 
where: 

m  ■  The  cardinality  of  the  argument  set  (C) . 
n  *  The  cardinality  of  the  LEM/RIM  set  of  the  relation. 

K  ■  The  constant  number  of  memory  references  made  for 
each  argument  set  record  found  in  step  1. 

D  *  The  constant  number  of  memory  references  made  by  the 
housekeeping  operations. 

Clearly,  the  first  term  corresponds  to  step  1  and  the  second 
term  corresponds  to  the  constant  number  of  memory  references 
made  by  the  housekeeping  operations  and  the  number  of  memory 
references  made  in  the  step  2.  Note  that  in  the  worst  case 
”m"  may  be  greater  than  or  equal  to  "n”.  Let's  assume  the 
"m"  is  a  constant  multiple  of  ”n”  and  multiply  that  constant 


with  the  constant  K  for  making  the  behaviour  of  the  algorithm 
clear#  then  the  complexity  function  becomes: 

f  -  K*  (n3)  +  C 

So  we  conclude  that  the  worst  case  asymptotical  time 
complexity  behaviour  of  that  algorithm  is  0(n3)  or  worse. 
For  example  if  m  is  equal  to  the  square  of  n  in  some 
instance#  the  algorithm  behaves  like  an  0(n4)  algorithm. 
Complement  of  a  relation  (non:R) : 

The  complement  of  a  relation  can  be  defined  as  the  set  of 
tuples  that  belong  to  the  universal  relation  on  the  MEM  set 
of  the  original  relation  other  than  the  tuples  that  are  in 
the  original  relation.  So  the  incidence  vector  of  the 
complement  of  a  relation  is  the  incidence  vector  of  che 
original  relation  in  which  all  the  entries  are  complemented. 
In  the  hash  incidence  vector  representation  case#  all  we  have 
to  do  is  complement  the  incidence  vector  as  a  whole  to  obtain 
the  complement  of  the  relation  in  question.  But  we  must  not 
forget  that  we  want  to  preserve  the  original  relation  for 
possible  subsequent  references#  so  we  need  to  make  a  separate 
copy  of  the  original  relation.  The  algorithm  for  Hash- 
Incidence-Vector  representation  is  as  follows: 

1.  Make  separate  copies  of  the  LEM  set  and  the  RIM  set 
of  the  original  relation  under  the  new  relation  identifier 
"non:R" . 


2.  Allocate  a  block  of  memory  as  large  as  the  original 
incidence  vector. 

3.  Establish  the  record  of  the  new  relation  in  the 
relation  table*  furnish  its  fields  as  was  done  in  the 
previous  algorithms. 

4.  Pipeline  the  original  incidence  vector*  obtain  the 
complements  of  the  sequence  of  bits  as  they  fit  into  the 
accumulator  and  copy  them  to  the  corresponding  location  in 
the  new  incidence  vector . 

In  this  algorithm  we  assumed  the  existence  of  some  hardware 
help  (pipelining)  but  that  does  not  change  the  asymptotical 
complexity  behaviour  of  that  algorithm.  Since  it  only  speeds 
up  the  execution  by  some  constant  factor*  in  the  absence  of 
pipelining  the  resulting  asymptotical  time  complexity  would 
be  the  same.  The  worst  case  time  complexity  function  of  that 
algorithm  can  be  written  as: 

f  »K*n  +  L*m  +  T*( (n*m)/Cl*C2) )  +  C 
where: 

m  »  The  cardinality  of  the  LEM  set  of  the  original 
relation. 

n  ■  The  cardinality  of  the  RIM  set  of  the  original 
relation. 

The  constant  K  is  the  number  of  memory  references  made  while 
copying  each  RIM  set  record;  the  constant  L  is  the  number  of 
memory  references  made  while  copying  each  LEM  set  record;  the 
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constant  T  is  the  number  of  memory  references  made  while 
complementing  and  copying  each  bit  sequence  of  the  original 
incidence  vector;  the  constant  Cl  is  the  memory  word  length; 
the  constant  C2  is  the  pipelining  factor;  and  the  constant  C 
is  the  number  of  memory  references  made  by  the  housekeeping 
operations  such  as  updating  the  relation  table.  Let  m»n, 
Z-K+L  and  tJ-T/(Cl*C2);  we  can  rewrite  the  function  as: 

f  *  0*n*n  +  Z*n  +  C 

Clearly  the  algorithm  has  the  worst  case  asymptotical  time 
complexity  behaviour  of  O(n^).  We  can  expect  some 
contribution  from  pipelining  and  the  speed  of  the  complement 
operation.  Of  course  we  are  not  copying  n*m  memory  location 
but, 

(m*n)/Cl 

memory  locations. 

Now  we  have  to  define  the  algorithm  for  the  table 
representation.  The  complexity  and  the  high  cost  of  that 
algorithm  should  be  apparent  to  the  reader  at  this  point. 
One  relatively  efficient  way  is  to  use  the  SCHT  mechanism. 
The  algorithm  is  as  follows: 

1.  Start  from  the  beginning  of  the  relation's  table  and 
proceed  down  the  table.  For  each  table  record  found/  extract 
the  right  individual/  hash  to  the  SCHT  with  that  individual/ 
and  establish  its  record.  Link  the  right  individuals' 
records  in  the  SCHT  by  their  TASE  links  as  they  are  created. 
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(*  Step  1  effectively  creates  the  RIM  set  of  the  relation  in 
the  SCHT  *) 

2.  Perform  the  disconnection  operation  on  the  SCHT. 
Mark  the  beginning  of  the  RIM  set  of  the  relation. 

3.  Start  from  the  beginning  of  the  relation's  table  and 
proceed  down  in  the  table,  record  by  record.  For  each  record 
found  extract  the  left  individual,  hash  into  the  SCHT  with 
that  individual,  and  create  a  record  of  that  individual  (if 
there  is  no  record  for  that  individual  in  the  SCHT).  Make  a 
separate  copy  of  the  RIM  set,  and  set  the  TASE  link  of  the 
left  individual's  record  created  to  the  copy  of  the  RIM  set. 
(*  In  step  3  we  have  established  the  universal  relation  of 
the  given  relation  in  SCHT,  in  an  ad jacency-list-like 
representation.  *) 

4.  Start  from  the  beginning  of  the  relation's  table  and 
proceed  down  in  the  table  record  by  record.  For  each  record 
found  extract  the  left  individual  and  extract  the  right 
individual,  hash  into  the  SCHT  with  the  left  individual,  find 
the  SCHT  record  of  that  individual,  search  the  right 
individual's  record  in  the  bucket  (copy  of  the  RIM  set)  by 
following  the  TASE  links  between  the  records  in  the  bucket 
and  delete  it  from  the  bucket.  (*  In  step  4  we  have 
established  the  complement  of  the  given  relation  in  SCHT,  in 
an  adjacency-list-like  representation.  *) 


5.  Start  from  the  beginning  of  the  relation's  table  and 
proceed  down  in  the  table  record  by  record.  Por  each  record 
found  extract  the  left  individual  and  hash  with  that 
individual  into  the  SCHT.  If  a  record  of  that  individual  is 
found  and  the  TASS  link  field  of  that  record  does  not  contain 
the  value  "nil",  create  a  new  table  record  of  the  new 
relation,  and  copy  the  PML  field  of  the  left  individuals 
record  into  the  "left"  field  of  the  table  record  created. 
Follow  the  pointer  found  in  the  TASS  link  of  the  left 
individual's  SCHT  record  and  find  the  right  individual's 
record  (the  record  of  the  right  individual  that  is  in 
relation  with  the  left  individual  in  question  under  the  new 
relation).  Copy  the  PML  field  of  that  record  into  the 
"right"  field  of  the  table  record  created.  If  there  remains 
other  right  individuals'  records  in  the  bucket,  create  a  new 
table  record  for  each  of  them  and  copy  the  PML  field  of  the 
left  individual's  record  into  the  "left"  field  and  the  PML 
field  of  the  right  individual's  record  (in  turn)  into  the 
"right”  field  of  that  record.  Link  the  table  records  created 
in  this  manner  to  each  other  by  their  "link"  fields.  Delete 
a  left  individual's  record  and  the  bucket  of  records 
connected  to  it  when  all  the  table  records  that  can  be 
created  from  them  are  created.  (*  Step  5  establishes  the 
table  of  the  complement  of  the  given  relation  *) 
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Now  we  will  do  the  worst  case  asymptotical  time 
complexity  analysis  of  this  algorithm. 

We  have  to  define  what  would  be  the  worst  case  for  this 
algorithm.  In  the  worst  case;  the  relation  may  be  a 
universal  relation  on  its  LEM  set  and  RIM  set.  In  that  case 
the  resulting  relation  is  the  empty  relation.  In  that  ease 
step  4  of  the  algorithm  becomes  very  costly.  The  worst  case 
time  complexity  function  is  given  below: 

f  -  K*p  +  L*n  ♦  M*((p-m)+**n)  +  N*m»n*(n+1)  +  Q*p  +  C 

2 

where: 

p  *  n*m  *  relation  size. 

n  *  the  cardinality  of  the  RIM  set  of  relation. 

m  -  the  cardinality  of  the  LEM  set  of  relation. 

The  first  term  corresponds  to  step  1 ,  the  second  term 
corresponds  to  step  2,  the  third  term  corresponds  to  step  3r 
the  fourth  term  corresponds  to  step  4,  and  the  fifth  term 
corresponds  to  step  5  of  the  algorithm.  Constants  K,  L,  M,  N 
and  Q  represent  the  number  of  memory  references  made  in  each 
iteration  of  the  corresponding  steps.  Constant  C  represents 
the  number  of  memory  references  made  by  the  housekeeping 
operations. 

The  third  and  fourth  terms  of  the  complexity  function  may 
not  be  clear  to  the  reader ,  so  we  will  explain  how  we  found 
those  terms.  In  step  3  we  got  through  the  entire  table  of 
the  relation,  but  we  made  m  separate  copies  of  the  RIM  set  of 
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the  relation,  where  m  is  the  number  of  distinct  left 
individuals  in  the  left  column  of  the  table.  Bence  we  make 
(p-m)  memory  references  without  making  the  separate  copy  of 
the  RIM  set  in  the  SCHT,  for  continuing  to  proceed  down  in 
the  table.  In  addition  to  that  we  make  m  memory  references 
and  as  a  result  of  each  of  them  we  make  a  separate  copy  of 
the  RIM  set  of  the  relation  which  requires  n  memory 
references.  Mote  that  even  though  we  say  that  we  make  (p-m) 
memory  references  or  m  memory  references,  these  are  not  the 
actual  memory  references  we  make.  In  fact  these  are  the 
iteration  factors  to  be  multiplied  by  the  constant  number  of 
memory  references  made  in  each  iteration,  which  is 
represented  as  the  averaged  constant  K.  In  the  fourth  step 
again  we  get  through  the  table  of  the  relation.  Because  in 
the  worst  case  the  size  of  the  table  is  equal  to: 
n*m 

we  make  n  memory  references  for  each  of  the  distinct  m  left 
individuals.  For  each  of  those  n  memory  references  we  have 
to  search  in  the  bucket  one  of  the  n  right  individuals,  but 
after  searching  and  finding  one  of  the  right  individuals  we 
delete  the  record  of  that  individual  from  the  particular 
bucket  (that  belongs  to  the  one  of  the  m  left  individuals)  in 
question  so  the  bucket  size  decreases  by  1.  A  subsequent 
search  for  one  of  the  remaining  right  individuals  has  to  be 
done  in  a  bucket  smaller  than  the  first  bucket.  By  assuming 
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that  in  the  worst  case  the  bucket  is  searched  to  the  end  and 
the  record  of  the  riqht  individual  in  question  is  always 
found  as  the  last  record  of  the  bucket,  we  make: 

n,  (n-1) ,  (n-2) ,  (n-3) ,  . .  (n-n+1) 

memory  references  for  each  distinct  left  individual  (that  has 
a  bucket  in  SCHT).  We  can  write  the  above  sequence  in  a 
compact  form  as: 
n 

i  -  n(n+l) 

i-1 

We  multiply  the  iteration  factors  with  the  constant  number  of 
memory  references  made  in  each  interation  (which  is  in  this 
case  ”N”)  to  obtain  the  fourth  term. 

We  accepted  the  relation  size  as  the  product  of  the 
cardinalities  of  the  RIM  and  the  LEM  sets.  In  addition  to 
that: 

Let, 

U  -  H/2 

Z-K  +  Q  +  M  +  U 
S  -  L  -  M 

and  of  course  p  *  n*n;  then  the  complexity  function  becomes: 

f  •  0*(n3)  ♦  Z*  (n2)  ♦  S*n  ♦  C 
a  polynomial  of  degree  3. 
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So  wo  conclude  that  the  worst  case  asymptotical  time 
complexity  behaviour  of  this  algorithm  is  0(n3).  So  the 
complexity  function  of  this  algorithm  asymptotically 
dominates  the  one  we  defined  for  the  previous  algorithm. 

Size  Operation  (#:C)  : 

This  operation  is  defined  only  for  sets.  We  often  need 
the  size  or  in  other  words,  the  cardinality  of  a  set.  This 
operation  provides  us  with  the  cardinality  of  a  given  set 

The  algorithm  is  as  follows: 

1.  Get  the  identifier  of  the  set. 

2.  Hash  with  this  identifier  to  the  set  table:  find  the 
record  of  the  set. 

3.  Look  up  the  "size”  field  of  that  record.  If  it  is 
not  negative  then  return  the  contents  of  that  field ,  else  do 
step  4. 

4.  Follow  the  PSS  field  of  the  record  found  in  step  2: 
find  the  first  set  record.  Proceed  in  the  set  structure  by 
following  the  TASS  link  fields  of  the  set  records.  Keep  a 
count  which  is  initialized  to  0  and  increment  it  for  each 
record  found  above. 

5.  After  the  linked  list  structure  of  the  set  is 
exhausted,  return  the  value  of  the  count  and  also  establish 
it  in  the  "size"  field  of  the  set's  record  in  the  set  table. 

Clearly  the  algorithm  goes  through  the  set  structure  once 
and  has  the  complexity  function: 


f  »  K*n  +  C 

where  constant  K  is  the  number  of  memory  references  made  for 
each  record  of  the  set  found,  constant  C  is  the  number  of 
memory  references  made  in  steps  1,  2,  3  and  5,  and  variable 
”n”  is  the  cardinality  of  the  given  set. 

So  the  algorithm  has  the  worst  case  time  complexity 
behaviour  of  0(n). 

Pair  Operation  (,  )  x,y  ■  (x,y)  : 

That  operation  takes  two  individuals  and  constructs  a 
relation  that  has  only  one  tuple  in  it.  The  first  argument 
individual  becomes  the  left  member  and  the  second  argument 
individual  becomes  the  right  member  of  the  unique  tuple  of 
the  resulting  relation.  The  algorithms  for  Hash-Incidence- 
Vector  representation  and  table  representation  are  equally 
simple. 

The  algorithm  for  Hash-Incidence-Vector  representation  is 
given  below: 

1.  Get  the  argument  individuals. 

2.  Hash  to  the  relation  table  under  the  relation 
identifier , 

"(first  argument  individual) (second  argument  individual)” 
establish  its  record  in  the  relation  table. 

3.  Hash  to  the  LHT  with  the  first  argument  individual 
(of  course  after  concatenating  it  with  the  above  relation 
identifier).  Establish  its  LHT  record,  put  1  into  the  index 
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field  of  that  record#  put  nil  into  the  TASE  field#  and  put 
the  pointer  to  the  aeaory  location  where  the  individual  is 
actually  stored  into  the  PML  field  of  that  record.  Establish 
the  above  relation  identifier  in  the  "Rid”  field  of  that 
record.  Put  the  pointer  to  that  record  into  the  PPLM  field 
of  the  relation's  record  established  in  step  2. 

4.  Repeat  step  3  for  the  second  argument  on  the  RHT 
(i.e.#  hash  into  the  RHT  instead  of  LHT) . 

5.  Allocate  a  aeaory  location  for  the  incidence  vector; 
set  first  bit  fraa  the  left  to  1  and  the  others  to  0. 

6.  Put  the  address  of  that  aeaory  location  into  the  base 
field  of  the  relation's  record  in  the  relation  table. 

7.  Put  1  into  both  the  ”|RIM|”  and  "|lem|”  fields  of  the 
relation's  record. 

The  coaiplexity  function  of  the  algor itha  is: 

f  »  Cl 
where: 

constant  "Cl"  is  the  nuaber  of  aeaory  references  made  in  the 
algorithm.  The  algorithm#  under  every  condition  makes  Cl 
aeaory  references.  So  the  algorithm  has  the  worst  (also  the 
average)  case  tiae  complexity  behaviour  of  0(c). 

The  algorithm  for  table  representation  seems  less  costly 
than  this  algorithm#  but  we  have  to  remember  that  we  did  not 
define  the  environment  in  which  the  table  representation  is 
defined.  That  algorithm  may  also  be  as  costly  as  the 


previous  algorithm  depending  on  the  environmental 
requirements. 

The  algorithm  for  table  representation  is  given  below: 

1.  Get  the  argument  individuals. 

2.  Allocate  a  table  record. 

3.  Put  all  into  the  "link”  field  of  that  record. 

4.  Put  the  pointer  to  the  memory  location  where  the 
first  argument  individual  is  saved  into  the  "left”  field,  and 
pointer  to  the  memory  location  where  the  second  argument 
individual  is  saved  into  the  "right"  field  of  that  table 
record. 

This  algorithm,  like  the  previous  algorithm,  has  a 
constant  complexity  function.  But  the  constants  are 
different.  We  conclude  that  both  algorithms  are  cheap 
constant  time  algorithms. 

Left  Members  (lem:R)  : 

This  operation  takes  a  relation  identifier  and  returns 
the  set  of  left  members  of  that  relation.  A  left  member  of  a 
relation  can  be  defined  as  the  member  which  occurs  on  the 
left  side  of  at  least  one  tuple  of  the  relation. 

In  our  hash- incidence-vec tor  representation  this  set  is 
already  available  in  the  LHT  as  a  collection  of  LHT  records 
linked  to  each  other  by  their  TASE  links.  The  header  of  that 
structure  is  the  record  of  the  relation  in  the  relation 
table.  The  PFLM  field  of  that  record  points  at  the  first 


left  member  of  the  relation  which  is  the  beginning  record  of 
the  LEM  set  structure  that  we  look  for. 

Even  though  we  have  this  set  readily  available  we  have  to 
carry  it  into  the  SHT  in  order  to  make  the  resulting  set 
known  by  the  system.  We  write  the  algorithm  for  this 
operation  as  follows: 

1.  Hash  into  the  relation  table  with  the  identifier  of 
the  relation  in  question  and  find  the  record  of  the  relation. 

2.  Pollow  the  pointer  found  in  the  PPLM  field  of  this 
record  and  the  first  record  of  the  argument  relation's  LEM 
set. 

3.  Proceed  in  the  LEM  set  record  by  record.  Por  each 
record  found,  hash  into  the  SHT  with  the  individual  being 
represented  by  this  record  and  establish  its  SHT  record  in 
the  SHT  under  the  set  identifier  ”lem:R”.  Link  the  SHT 
records  created  in  this  manner  to  each  other  as  they  are 
created.  Keep  a  count  beginning  with  0  and  increment  it  for 
each  SHT  record  created. 

4.  Hash  to  the  set  table  with  the  set  identifier 
”lem:R";  establish  the  record  of  this  set  in  the  set  table. 
Put  the  pointer  to  the  first  record  of  the  resulting  set 
structure  (which  is  established  in  SHT)  into  the  PSS  field 
and  the  last  value  of  the  count  into  the  CARO  field  of  this 
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Now  we  will  do  the  worst  case  asymptotical  time 
complexity  analysis  o£  this  algorithm. 

We  can  write  the  time  complexity  function  of  this 
algorithm  as  follows: 

f  -  K*n  +  C 

where  the  constant  K  represents  the  constant  number  of  memory 
references  made  for  each  record  of  the  argument  relation's 
LEM  set  in  step  3.  The  constant  C  represents  the  constant 
number  of  memory  references  made  in  steps  1,  2  and  4.  The 
variable  n  is  the  cardinality  of  the  argument  relation's  LEM 
set. 

So  as  can  be  seen,  the  algorithm  has  the  worst  case 
asymptotical  time  complexity  behaviour  of  0(n). 

The  algorithm  that  works  on  the  table  representation 
makes  use  of  the  SCHT  mechanism  for  recognizing  and  removing 
duplicates  of  the  individuals  from  the  left  column  of  the 
table.  In  fact  if  there  was  not  the  possibility  of  the 
duplicates  in  the  left  columnf  we  could  strip  off  the  left 
column  of  the  table  without  using  the  SCHT  mechanism  in  order 
to  obtain  the  LEM  set  of  the  relation.  But  even  in  this  case 
the  algorithm  is  costly  and  has  the  same  asymptotical 
complexity  behaviour  as  the  algorithm  that  we  will  now 
define.  The  algorithm  is  as  follows: 

1.  Start  from  the  beginning  of  the  relation's  table; 
proceed  down  in  the  table  record  by  record,  by  following  the 
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links  between  the  records.  For  each  record  found  in  this 
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manner^  extract  the  left  individual/  hash  with  that 
individual  into  the  SCHT  and  if  there  is  no  record  present 
for  that  individual  in  the  SCHT/  establish  the  record  of  that 
individual  in  the  SCHT.  If  this  record  was  the  first  record 
created  mark  it  with  pointer  "P".  Link  the  records  created 
in  the  SCHT/  in  the  above  manner/  to  each  other  as  they  are 
created.  Keep  a  count  by  beginning  with  0  and  increment  it 
for  each  record  created  in  the  SCHT.  Continue  to  examine  the 
table  records  of  the  relation  until  the  end  of  the  relation 
is  encountered. 

2.  Start  from  the  beginning  of  the  set  created  in  the 
SCHT  by  following  the  pointer  *P"  which  is  set  in  step  1. 
For  each  set  record  found  by  proceeding  in  the  set  record  by 
record,  hash  into  the  SCHT  with  the  individual  being 
represented  by  that  record  and  find  the  SCHT  entry  to  which 
the  record  in  question  is  connected  directly  or  indirectly 
(i.e./  by  being  in  a  bucket  which  is  connected  to  that  hash 
table  entry).  Put  nil  into  the  hash  table  entry  found/  and 
put  nil  into  the  collision  link  fields  of  the  records  of  the 
bucket  if  there  exists  a  bucket  which  was  connected  to  this 
hash  table  entry. 

3.  Establish  a  record  of  the  resulting  set  in  the  set 
table  under  the  set  identifier: 

"lem:" (relation’s  identifier). 
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Put  the  pointer  "P"  into  the  PSS  field  of  that  record,  and 
put  the  last  value  of  the  count  into  the  CARD  field  of  that 
record. 
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We  write  the  worst  case  time  complexity  function  of  that 
algorithm  as  follows: 

f  ■  K*p  +  L*n  +  C 
where: 

P  ■  The  size  of  the  relation. 

n  ■  The  cardinality  of  the  LEM  set  of  the  relation. 

K  »  Constant  number  of  memory  references  made  for  each 
table  record  found  in  step  1. 

L  »  Constant  number  of  memory  references  made  for  each 
set  record  found  in  step  2. 

C  «  Constant  number  of  memory  references  made  in  step  3 
of  the  algorithm. 

By  assuming  that  the  LEM  set  and  the  RIM  set  of  the  relation 
have  the  common  cardinality  "n",  in  the  worst  case  (i.e., 
when  p  *  n*n)  we  can  rewrite  the  worst  case  time  complexity 
function  as: 

f  *  R*(n2)  +  L*n  +  C 

By  looking  at  the  exponent  of  the  term  with  the  larger 
exponent  in  the  above  function,  we  conclude  that  the  worst 
case  asymptotical  time  complexity  behaviour  of  that  algorithm 
is  0{n2). 
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Members  (mem:R)  : 

Members  of  a  relation  are  the  individuals  which  are 
either  in  the  LEM  set  or  in  the  RIM  set  of  the  relation  or 
both.  So  this  operation  takes  the  LEM  set  and  the  RIM  set  of 
the  relation  and  obtains  the  union  of  these  sets.  So  the 
algorithm  for  hash-incidence-vector  representation  can  be 
written  as: 

1.  Call  the  operation  "Lem"  with  the  relation  being  the 
argument. 

2.  Call  tne  operation  "Rim"  with  the  relation  being  the 
argument. 

3.  Call  the  operation  "Set  Onion"  with  the  arguments 
being  the  identifiers  of  the  LEM  set  and  the  RIM  set  of  the 
relation  which  are: 

"lem:" (relation's  identifier). 

"rim:" (relation's  identifier). 

4.  Establish  the  record  of  the  new  set  in  the  set  table 
under  the  set  identifier: 

"mem:" (relation  identifier) 

instead  of  the  identifier  created  by  the  "Set  Onion" 
operation  automatically.  So  if  we  say  the  complexity 
function  of  the  "Set  Onion"  operation  is  "F",  the  complexity 
function  of  the  Members  algorithm  can  be  written  as: 


f  -  P  +  fl  +  f2 
where: 

fl  *  The  complexity  function  of  the  "Lem"  operation. 
f2  ■  The  complexity  function  of  the  "Rim"  operation. 

Since  the  complexity  functions,  F,  fl,  and  f2  are  all  linear 
functions  the  sum  of  those  functions  will  also  be  a  linear 
function.  Thus  we  conclude  that  the  worst  case  asymptotical 
time  complexity  behaviour  of  this  algorithm  is  0(n),  where 
"n"  is  assumed  to  be  the  common  cardinality  of  the  LEM  set 
and  the  RIM  set  of  the  relation.  We  do  the  same  in  the  table 
representation  case,  but  let's  define  the  algorithm  for  the 
table  representation  because  the  situation  slightly  differs 
from  the  previous  case.  The  algorithm  for  the  table 
wi*  representation  is  as  follows: 

1.  Start  from  the  beginning  of  the  relation's  table. 
Proceed  down  in  the  table  record  by  record,  by  following  the 
links  between  the  table  records.  For  each  record  found  in 
this  manner  do  the  steps  below: 

a.  Extract  the  "right"  individual,  hash  into  the  SHT 
with  that  individual  under  the  new  set  identifier  "mem:R"  and 
create  its  record  in  the  case  there  is  no  record  for  that 
individual  in  the  SHT  already. 

b.  Extract  the  "left"  individual  of  that  record,  hash 
into  the  SHT  with  that  individual  under  the  new  relation's 
identifier  and  create  a  record  of  that  individual  in  the  SHT 
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in  the  case  that  individual  is  not  being  represented  by  a 
record  in  the  SHT  already.  Link  the  records  created  in  this 
manner  to  each  other  by  their  TASG  links  as  they  are  created. 
Keep  a  count  beginning  with  0  and  increment  it  for  each  SHT 
(set)  record  created.  Mark  the  first  SHT  record  created  in 
the  above  manner  with  pointer  "P". 

2.  Do  step  2  until  the  end  of  the  relation's  table  is 
encountered. 

3.  Establish  the  record  of  the  resulting  set  in  the  set 
table  under  the  identifier: 

"mem:1' (relation's  identifier). 

Put  pointer  "P"  into  the  PSS  field  of  that  record  and  put  the 
last  value  of  the  count  into  the  CARD  field  of  that  record. 

How  we  will  do  the  worst  case  asymptotical  time 
complexity  analysis  of  this  algorithm. 

The  worst  case  time  complexity  function  of  that  algorithm 
can  be  written  as: 

f  »  K*p  +  C 
where: 

p  *  Relation  size/Table  size. 

K  *  Constant  number  of  memory  references  made  for  each 
table  record  found  in  the  steps  2  and  3,  which  is 
guaranteed  to  be  greater  than  or  equal  to  4. 

C  ■  Constant  number  of  memory  references  made  by  the 
housekeeping  operations. 
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He  know  that  in  the  worst  case  the  relation  may  be  equal  to 
the  cartesian  product  of  its  LEM  set  and  RIM  set,  so  by 
assuming  LEM  and  RIM  sets  of  the  relation  have  the  common 
cardinality  "n*  we  substitute  "n2"  in  place  of  "p"  in  the 
above  function ,  so  the  above  function  becomes: 

f  -  K*(n2)  +  C 

So  by  looking  at  the  exponent  of  the  term  with  the  larger 
exponent  in  the  above  function  we  conclude  that  the  algorithm 
has  the  worst  case  asymptotical  time  complexity  behaviour  of 
0(n2) . 

Relation  Intersection  (R6S)  : 

This  operation  takes  the  intersection  of  the  two 
relations.  The  result  is  a  relation  that  contains  those 
tuples  which  are  both  in  the  first  argument  relation  and  in 
the  second  argument  relation. 

Our  original  intend  was  to  take  the  intersection  of  the 
incidence  vectors  of  the  operand  relations  to  obtain  the 
resulting  relation's  incidence  vector.  But  the  nature  of  the 
relational  language  turned  out  to  be  very  dynamic;  the  left 
and  right  members  of  those  relations  may  be  in  an  arbitrary 
order  depending  on  the  order  in  which  their  records  have  been 
created.  As  a  result  of  that,  the  LET  and  RHT  records  take 
on  different  incidence  vector  indices  and  the  resulting 
incidence  vectors  may  have  different  structures,  i.e.,  the 
l's  in  the  corresponding  positions  may  mean  different  things 
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in  the  incidence  vectors.  In  addition  to  that/  in  our  system 
the  union,  intersection  and  difference  operations  on 
relations  are  defined  even  though  the  argument  relations  do 
not  have  the  same  LEM  sets  and  the  same  RIM  sets.  This  again 
does  not  let  us  use  the  union/  intersection  and  difference  of 
the  incidence  vectors  of  the  operand  relations  to  obtain  the 
incidence  vector  of  the  resulting  relation.  One  way  to  make 
use  of  the  incidence  vectors  as  intended  originally  is  to 
enforce  these  restrictions  in  the  definition  of  the  language: 

Let  R  and  S  be  the  operand  relations: 

1.  R  and  S  must  have  exactly  the  same  LEM  sets  and  the 
same  RIM  sets. 

2.  The  set  elements  must  be  in  order  and  should  always 
be  maintained  in  that  order. 

If  we  impose  those  restrictions  on  operand  relations,  it  is 
guaranteed  that  the  1's  in  the  corresponding  positions  of  the 
incidence  vectors  mean  the  same  thing,  and  then  it  becomes 
possible  to  utilize  fast  logical  operations  and  pipelining  on 
the  incidence  vectors.  But  the  asymptotical  behaviour  of  the 
algorithms  remains  the  same,  because  the  size  of  the 
incidence  vector  is: 

(n2)/cl 

where  "n"  is  defined  to  be  the  cardinality  of  both  LEM  and 
the  RIM  set  of  the  relation  and  "Cl”  is  the  memory  word 
length.  Me  can  decrease  the  cost  by  another  constant  factor 
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which  comes  from  pipelining.  So  the  complexity  function  of 
the  algorithm  becomes  roughly: 

f  -  (n*n)/(Cl*C2)  +  C 
where  C2  is  the  pipelining  factor. 

In  the  present  case,  the  language  does  not  have  the 
restrictions  explained  above  and  our  algorithms  will  be 
defined  according  to  the  present  definition  of  the  language. 
We  will  see  that  worst  case  asymptotical  time  complexity 
behaviour  of  the  algorithms  will  not  change  but  the 
algorithms  will  be  slightly  inefficient. 

We  can  express  the  intersection  of  two  relations  as  given 
below: 

R6S  ■  P  *  {<x,y>  |  <x,y>  e  R  and  <x,y>  e  S  } 

Thus  our  algorithms  are  supposed  to  produce  the  relation  P 
that  satisfies  the  above  condition,  given  the  relations  R  and 
S  as  arguments. 

The  algorithm  for  Hash- Incidence -Vector  representation  is 
given  below: 

Let  R  be  the  first  operand  relation  and  S  be  the  second 
operand  relation. 

1.  Get  the  operand  relations'  identifiers. 

2.  Hash  with  the  first  and  the  second  argument 
relations'  identifiers  to  the  relation  table  (RT),  follow  the 
pointers  found  in  the  PFLM  and  PFRM  fields  of  the  relation 


R's  record,  and  find  the  records  of  the  first  left  member  and 
the  first  right  member  of  the  relation  R. 

a.  Start  from  the  beginning  of  the  LEM  set  of  the 
relation  R.  For  each  record  found  by  following  the  TASE 
links  of  the  LEM  records,  until  the  LEM  set  is  exhausted, 
hash  into  the  LHT  with  the  individual  in  question  under  the 
relation  S.  Check  if  a  record  of  that  individual  is  present 
in  the  LEM  set  of  relation  S,  if  so  hash  into  the  LHT  under 
the  new  relation's  identifier  (which  is:  "R&S").  Establish 
a  copy  of  that  individual's  record.  Link  the  records  copied 
in  this  manner  to  each  other  as  they  are  created  by  their 
TASE  links. 

b.  Repeat  step  2-a  for  the  RIM  set  of  the  relation  R 
on  the  RHT,  by  also  looking  up  the  RIM  set  of  the  relation  S. 
Keep  a  count  for  the  new  RIM  set  being  created  and  increment 
it  for  each  individual  detected  to  be  in  the  set.  Put  the 
updated  value  of  that  count  into  the  index  field  of  the 
record  that  represents  the  individual  which  has  been  detected 
to  be  in  the  set. 

c.  Start  from  the  beginning  of  the  new  LEM  set  and 
establish  the  records'  index  fields  which  are  connected  to 
each  other  by  their  TASE  link  fields  by  beginning  from  1  and 
incrementing  the  index  by  the  last  value  of  the  count 
maintained  in  step  2-b.  Furnish  the  index  fields  of  these 
records. 


d.  Establish  the  record  of  the  new  relation  in  the 
relation  table  under  the  new  relation  identifier.  Establish 
the  cardinalities  obtained  by  keeping  count  during  the 
creation  of  new  LEM  set  and  new  RIM  set  into  the  N|LEM|"  and 
■  |  RIM  |  ■  fields  of  that  relation  record  respectively. 
Establish  the  pointers  to  the  first  records  of  the  new  LEM 
and  RIM  sets  into  the  PFLM  and  PFRM  fields  of  that  record. 
Allocate  a  block  of  memory  of  size  ( |  LEM  |  *  |  RIM  |  )/C,  where  C 
is  the  memory  word  length.  Put  the  beginning  address  of  that 
block  into  the  base  field  of  the  relation's  record. 

3.  Proceed  in  the  LEM  set  of  relation  R,  record  by 
record  by  following  the  TASE  link  fields  of  records.  For 
each  record  found/  extract  the  individual  being  represented 
by  that  record  and  do  the  steps  below. 

a.  By  starting  from  the  beginning  of  the  RIM  set  of 
the  relation  R,  proceed  down  in  the  RIM  set  record  by  record, 
by  following  the  TASE  links  between  the  records.  For  each 
record  found  in  this  manner  extract  the  individual  being 
represented  by  that  record.  Check  the  incidence  vector 
location  corresponding  to  the  tuple  found  in  step  3  and  step 
3-b  to  see  if  it  contains  1.  If  not,  do  nothing,  else 
continue  with  the  steps  below. 

b.  For  each  pair  of  individuals  found  in  step  3  and 
step  3-a,  hash  into  the  LHT  with  the  individual  found  in  step 
3  and  hash  into  the  RHT  with  the  individual  found  in  step  3-b 


under  the  relation  S  and  check  if  this  tuple  is  also  present 
in  the  relation  S  (by  using  the  reference  algorithm). 

c.  If  so  hash  into  the  LHT  and  RHT  with  those 
individuals  but  this  time  under  the  new  relation  identifier. 
Using  the  reference  algorithm ,  find  the  new  incidence  vector 
location  that  corresponds  to  this  tuple  and  set  that  bit  to 
1. 

d.  Else  do  nothing. 

Note  that  the  R&S  and  S&R  refers  to  the  same  relation  and 
when  we  create  a  relation  with  identifier  "RtS"  and  establish 
relation's  record  in  the  relation  table  with  that  identifier 
a  subsequent  reference  to  the  S&R  may  cause  the  same  relation 
to  be  reconstructed  redundantly.  In  order  to  eliminate  this 
possibility  we  will  accept  a  convention  and  design  the  system 
so  that  whenever  R&S  or  S&R  is  referenced,  we  first  look  in 
the  RT  by  hashing  with  the  identifier  "R&S".  If  no  record  is 
present,  then  we  hash  with  identifier  "S&R*.  If  a  record  is 
present,  we  assume  the  original  reference  is  S&R  instead  of 
R&S.  From  that  point  on  the  relation  S&R  participates  in 
operations  instead  of  R&S  vice  versa. 

Now  we  will  do  the  worst  case  asymptotical  time 
complexity  analysis  of  this  algorithm. 

Note  that  the  given  algorithm  reflects  the  worst  case, 
which  means  neither  R&S  nor  S&R  has  been  constructed 


previously.  The  worst  case  complexity  faction  of  this 
costly  algorithm  can  be  written  as: 

f  »  K*m  +  L*n  +  Y*t  +  T*m*n  +  C 
where  first  term  corresponds  to  step  2-a,  second  term 
corresponds  to  step  2-b,  third  term  corresponds  to  step  2-c, 
fourth  term  corresponds  to  step  3 ,  "m"  is  the  cardinality  of 
the  LEM  set  of  the  relation  R,  "n"  is  the  cardinality  of  the 
RIM  set  of  the  relation  R,  "t"  is  the  cardinality  of  the  LEM 
set  of  the  resulting  relation.  Constant  C  represents  the 
number  of  memory  references  made  in  the  steps  other  than  the 
steps  indicated  above.  Constant  K  represents  the  number  of 
memory  references  made  for  each  LEM  set  record  in  step  2-a. 
Constant  L  represents  the  number  of  memory  references  made 
for  each  RIM  set  record  in  step  2-b.  Constant  7  represents 
the  number  of  memory  references  made  for  each  LEM  set  record 
of  the  resulting  relation  in  step  2-c.  Constant  T  represents 
the  number  of  memory  references  made  for  each  pair  of 
individuals  found  in  step  3.  Now  let  m*n»t  and  Z-Ia+L+Y) , 
the  complexity  function  becomes: 
f  ■  T*n*n  +  Z*n  +  C 

So  the  complexity  function  can  be  viewed  as  a  second  degree 
polynomial.  We  conclude  that  the  algorithm  has  the  worst 
case  asymptotical  time  complexity  behaviour  of  0(n2). 


Mow  we  have  to  consider  how  we  could  perform  this 
operation  in  the  case  of  the  table  representation.  The 
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algorithm  is  given  below: 

1.  Take  the  relation  with  smaller  size  and  make  a 
separate  copy  of  it. 

2.  Start  from  the  beginning  of  the  second  relation's 
table  and  proceed  down  in  the  table.  For  each  record  found , 
extract  the  left  individual  and  search  for  that  individual  in 
the  left  column  of  the  new  table.  If  it  is  found  compare  the 
right  individuals  of  the  current  record  of  the  new  table  and 
the  record  of  the  second  relation  above.  If  a  common  tuple 
is  found  in  this  manner ,  extract  the  new  table's  record 
representing  that  individual  from  its  place  and  carry  it  to 
the  top  of  the  table.  If  it  is  the  first  record  carried  to 
the  top,  mark  it  with  pointer  Z. 

3.  Repeat  step  2  until  the  table  of  the  second  relation 
is  exhausted  or  until  the  pointer  Z  points  at  the  bottom  of 
the  new  table. 

4.  Delete  the  records  below  the  Z  (if  there  are  any) . 

In  the  best  case  two  operand  relations  may  almost  be  the 
same.  By  factoring  out  the  constants,  the  complexity 
function  can  be  written  as: 

f  •  p  +  (p-l)  +  (p-2)  + .  +  (p  -  p  +  1) 

where  ”p”  refers  to  the  size  of  both  the  operand  relations. 
This  can  be  rewritten  as: 


284 


(p-p+k) 


p*(p+l)/2 


k-1  k-1 

So  we  know  that  we  can  not  do  better  than  that.  Now  we  have 
to  decide  about  the  worst  case.  In  the  worst  case  the  operand 
relations  may  be  disjoint/  in  which  case  for  each  tuple  of 
the  first  relation  we  go  through  exhaustively  the  whole 
copied  relation  and  we  delete  the  copied  relation  as  a  whole 
at  the  end.  So  the  worst  case  complexity  function  becomes: 

f  *  K*p*r  +  L*r  +  M*r  +  C 

where  Constant  K  is  the  number  of  memory  references  made  in 
each  iteration  of  step  2,  constant  L  is  the  number  of  memory 
references  made  wn^le  copying  each  record  of  the  second 
relation  in  step  1/  constant  M  is  the  number  of  memory 
references  made  while  deleting  each  record  of  the  copied 
relation  in  step  4,  constant  C  is  the  number  of  memory 
references  made  by  the  housekeeping  operations/  variables  p 
and  r  are  the  sizes  of  the  operand  relations. 

We  know  that  in  the  worst  case  both  operand  relations  may 
be  universal  relations  on  their  LEM  and  RIM  sets.  In  that 
case/  as  explained  before/  the  sizes  of  the  relations  are  the 
product  of  the  cardinalities  of  their  LEM  and  RIM  sets  re¬ 
spectively.  Let  n  be  the  common  size  of  the  LEM  and  RIM  sets 
of  both  relations  and  let  the  operand  relations  be  the  uni¬ 
versal  relations  on  their  LEM  and  RIM  sets  or  in  other  words# 


1 

j 


p  *3* 


let  the  operand  relations'  sizes  be  equal  to  the  cartesian 
product  of  their  LEM  and  RIM  seta,  which  in  turn  means: 

p  »  q 

The  complexity  function  becomes: 

f  »  K*  (n4)  +  F*  (n2)  +  C 
where  F  *  L  +  M 

We  conclude  that  the  algorithm  has  the  worst  case 
asymptotical  time  complexity  behaviour  of  0(n4),  which  is  a 
very  costly  algorithm.  Again  there  exists  many  efficient 
algorithms  to  do  this  operation  but  they  are  not  compatible 
with  the  dynamics  of  our  system,  and  they  have  other 
redundancies  when  they  are  combined  with  the  whole  system. 
For  example,  if  we  maintain  our  tables  representing  relations 
in  sorted  order  this  algorithm  may  be  made  simpler  and 
cheaper  by  using  one  of  the  fast  searching  algorithms,  but 
maintaining  the  tables  in  sorted  order  is  a  significant 
burden  in  such  a  dynamic  system,  because  we  may  recompute  the 
individuals  in  the  relation  and  we  may  add  new  tuples  as  a 
result  of  the  relational  operations,  and  so  on.  These 
operations  are  so  frequent  that  every  time  sorting  the  tables 
is  a  significant  burden. 

Relation  Union  (R|S)  : 

This  operation  takes  two  relations  and  produces  a 
relation  in  which  each  tuple  is  either  present  in  one  operand 
relation  or  in  the  other.  That  can  be  stated  formally  as: 


286 


RjS  *  {  <x,y>  |  <x,y>  e  R  OR  <x,y>  e  S  } 

The  set  of  tuples  such  that  each  tuple  is  either  in  R  or  in 
S. 

The  algorithm  for  Hash- Incidence-Vector  representation  is 
as  given  below: 

Let  the  first  operand  relation  be  R  and  the  second 
operand  relation  be  S. 

1.  Find  the  records  of  the  relations  R  and  S  in  the 
relation  table. 

2.  Follow  the  PFLM  and  PFRM  fields  of  the  relation  R's 
record,  find  the  first  left  member's  record  and  the  first 
right  member's  record  in  the  LHT  and  in  the  RHT  respectively. 

3.  Repeat  step  2  for  relation  S. 

4.  Copy  the  RIM  set  of  relation  R  in  the  RHT  under  the 
relation  identifier  "R|S"  (as  it  was  done  in  many  previous 
algorithms).  Copy  the  RIM  set  of  relation  S  under  the 
relation  identifier  "RjS".  As  the  records  are  created  during 
the  copying  operation,  establish  the  new  indices  in  the  index 
fields  of  the  records  by  keeping  an  index  count  and 
incrementing  it  for  each  record  created,  then  by  putting  the 
current  value  of  it  into  the  index  field  of  the  record 
created  recently.  Link  the  records  created  in  the  above 
manner  to  each  other  by  their  TASE  link  fields. 

5.  Repeat  step  4  for  the  LEM  sets  of  relations  R  iv 
on  the  LHT.  But  this  time,  keep  one  count  for  estsr 
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indices  and  another  count  for  finding  out  the  cardinality  of 
the  new  LEM  set.  While  establishing  the  indices  of  the 
records*  begin  with  1  and  increment  the  index  count  by  the 
cardinality  of  RIM  set  for  each  record  created*  and  put  the 
updated  value  of  the  index  count  into  the  index  field  of  the 
record  recently  created.  For  each  record  created  increment 
the  cardinality  count. 

6.  Establish  the  new  relation's  record  in  the  relation 

table  under  the  identifier  "R|S”.  Establish  the  LEM  set 

/ 

cardinality  count  in  the  ]XiBM|  field  and  the  RIM  set  index 
count  in  the  |RXM|  field  of  that  record.  Put  the  pointers  to 
the  records  of  the  first  left  member  and  the  first  right 
member  into  the  PFLM  and  PFRM  fields  of  that  record 
respectively.  Allocate  a  memory  block  as  large  ast 

(|LBft|*|RXM|)/C 

where  C  is  the  memory  word  length.  Put  the  beginning  address 
of  that  block  into  the  base  field  of  the  above  record. 

7.  Start  from  the  beginning  of  the  LBN  set  of  relation 
R.  For  each  record  found  by  following  the  TASE  links  between 
the  records  until  the  LBN  set  is  exhausted;  do  the  steps 
below. 

a.  Start  from  the  beginning  of  RZN  set  of  relation  R* 
follow  the  TASE  links  between  the  records*  and  proceed  down 
in  the  RXN  set*  record  by  record. 


b.  Foe  each  tuple  found  which  is  being  represented  by 
the  record  pair  found  in  step  7  and  in  step  7-a#  reference  the 
incidence  vector  of  relation  R  (by  using  the  reference 
algorithm).  If  the  corresponding  incidence  vector  entry  is 
1#  hash  with  the  left  individual  of  the  tuple  into  the  LIT 
and  with  the  right  individual  of  the  tuple  to  the  RHT. 
Extract  the  indices  of  the  corresponding  records#  then 
reference  the  incidence  vector  of  the  new  relation  by  using 
the  reference  algorithm  and  put  1  into  the  incidence  vector 
entry  found. 

8.  Repeat  step  7  for  the  relation  S. 

Now  we  will  do  the  worst  case  asymptotical  time 
complexity  analysis  of  this  algorithm. 

The  algorithm  seems  to  be  expensive#  especially  steps  7 
and  8  are  the  costly  steps  of  this  algorithm.  We  write  the 
weist  case  complexity  function  of  this  algorithm  as  follows: 

f  ■  K*(m+p)  +  t#(n+q)  +  T*(m*n)  +  P*(p*q)  +  C 
where: 

m  *  cardinality  of  the  LEM  set  of  R. 

n  *  cardinality  of  the  RIM  set  of  R. 

p  *  cardinality  of  the  LEM  set  of  S. 

and  q  -  cardinality  of  the  RIM  set  of  S. 

Constants  in  front  of  each  term  indicate  the  number  of  memory 
references  made  for  each  iteration  of  the  corresponding  step. 
The  correspondence  between  the  steps  of  the  algorithm  and  the 


terms  of  the  function  as  follows:  the  first  term  corre¬ 
sponds  to  step  4,  the  second  term  corresponds  to  step  5,  the 
third  term  corresponds  to  step  7,  and  the  fourth  term  corre¬ 
sponds  to  step  8  of  the  algorithm*  Constant  C  is  the  number 
of  memory  references  made  by  the  other  steps  of  the  algo¬ 
rithm* 

Let: 

n  *  m  *  p  *  q  and 
R  »  2*K  +  2*L  and 
0  *  T  +  P. 

Then  the  complexity  function  becomes: 
f  -  U*(n2)  +  R*n  ♦  C 

So  we  conclude  that  the  worst  case  asymptotical  complexity 
behaviour  of  the  algorithm  is  0(n2). 

How  could  we  perform  the  same  operation  on  the  table 
representation?  Again  we  have  to  get  help  from  SCHT 
mechanism  in  order  to  make  the  algorithm  as  efficient  as  we 
can*  The  algorithm  is  as  follows: 

Let  R  be  the  one  operand  relation  and  S  be  the  other. 

1.  Start  from  the  beginning  of  the  table  of  relation  R, 
proceed  down  in  the  left  column  of  the  table ,  record  by 
record,  for  each  left  individual  found  in  that  way,  hash  with 
that  individual  into  the  SCHT,  create  a  record  of  that 
individual  in  SCHT  and  connect  it  directly  to  the  hash  table 
entry  found.  If  it  is  found  out  that  a  record  of  a  left 


individual  is  already  connected  directly  to  this  hash  table 
entry  (i.e.#  if  the  collision  occurs)#  search  for  the  next 
enpty  hash  table  entry  in  the  hash  table#  and  connect  the 
record  of  the  individual  in  question  to  that  hash  table  entry 
directly.  If  a  record  of  that  individual  is  already 
established  in  the  SCHT  previously#  do  nothing.  Create  an 
SCHT  record  of  the  right  individual  of  the  current  tuple# 
connect  it  to  the  record  of  the  left  individual  created 
above.  If  there  is  a  bucket  of  records  connected  to  the  left 
individual's  record  (as  a  result  of  establishment  of  previous 
tuples)#  add  the  record  of  right  individual  in  question  to 
the  end  of  that  bucket.  For  each  hash  table  entry  found  and 
used  up  in  this  Banner#  set  a  pointer  to  that  hash  table 
entry  and  put  that  pointer  into  the  temporary  array  of  type 
pointer . 

2.  Repeat  step  1  for  relation  S;  establish  the  records 
of  the  right  individuals  in  the  buckets  connected  to  the 
records  of  the  left  individuals  if  they  are  not  already 
present  in  the  buckets  (i.e.#  do  not  allow  repetition  of  same 
right  individual's  record  in  the  saae  bucket).  (*  In  the 
above  steps#  we  handled  the  collisions  by  rehashing  and  we 
use  bucketing  to  relate  the  right  individuals  with  the  left 
individuals.  In  fact  we  created  another  representation  of 
the  resulting  relation  in  SCBT.  As  we  know  this 


representation  technique  is  called  an  "Adjacency  list".  Mow 


the  remaining  ataps  of  the  algorithm  are  to  convact  that 
caprasantation  into  ouc  "table"  representation.  *) 

3.  Start  froa  the  beginning  of  the  temporary  pointer 
array  and  find  each  occupied  hash  table  entry  in  turn.  For 
each  bucket  found  connected  to  this  hash  table  entry*  do 
these  steps * 

a.  Extract  the  first  record  froa  the  bucket*  create  a 
table  record  and  put  the  PML  field  of  the  record  extracted 
froa  the  bucket  into  the  "left”  field  of  that  table  record. 

b.  Extract  the  next  record  froa  the  bucket*  and  copy 
the  PNL  field  of  that  record  into  the  "right"  field  of  the 
table  record. 

c.  While  there  remains  a  record  in  the  bucket*  create 
a  new  table  record*  copy  the  PML  field  of  the  record  found  in 
step  3-a  into  the  "left"  field*  and  the  PML  field  of  the 
reaaining  record  into  the  "right"  field  of  the  table  record 
created • 

4.  Link  the  table  records  created  in  step  3  to  each 
other  by  their  TASK  link  fields  as  they  are  created. 

Mow  we  will  do  the  worst  case  asyaptotical  time 
complexity  analysis  of  this  algor itha. 

In  the  worst  case  each  relation  aay  be  universal 
relations  on  their  LEM  and  rxm  sets.  In  addition  to  that* 
the  LEM  sets  of  the  arguaent  relations  aay  be  the  same*  but 
the  HIM  sets  aay  be  disjoint.  This  also  iaplies  that  the 


relations  are  disjoint.  Under  these  considerations,  in  step 
2  the  algorithm  searches  the  entire  bucket  and  can  not  find 
the  record  of  the  right  individual  in  question.  It  then  adds 
the  record  of  that  individual  to  the  end  of  the  bucket.  We 
have  to  note  that  in  the  worst  case  indicated  above  each 
bucket  has  as  a  any  records  as  the  cardinality  of  the  RIM  set 
of  the  relation  R.  Because  the  LEM  sets  of  the  relations  are 
the  sane,  each  bucket  constructed  in  SCHT  is  searched  in  the 
■anner  explained  above.  In  addition  to  that,  after 
teraination  of  steps  1  and  2  the  resulting  bucket  sixes  are 
the  total  of  the  cardinalities  of  the  RIM  sets  that  belong  to 
the  relations  R  and  S. 

The  worst  case  complexity  function  of  this  algorithm  can 
be  written  as: 

f  -  K*m*n  +  L*(p*q*n)  ♦  T«m*(n4q)  +  c 
where: 

m  *  cardinality  of  the  LEM  set  of  relation  R. 

n  -  cardinality  of  the  RIM  set  of  relation  R. 

p  *  cardinality  of  the  LEM  set  of  relation  S  ■  a 

q  *  cardinality  of  the  RIM  set  of  relation  S 

In  the  above  function: 


first  term  corresponds  to  step  1. 

second  term  corresponds  to  step  2. 

third  term  corresponds  to  step  3  of  the  algorithm 


n  »  n  *  p  *  q  and  Z  ■  2*T  +  R,  the  complexity  function 
becomes: 

f  -  L*(n3)  ♦  Z*(n2)  ♦  C 

So  we  conclude  that  the  worst  case  asymptotical  time 
complexity  of  that  algorithm  is  0(n3). 

Relation  Difference  (R  -  S)  : 

This  operation  takes  two  relation  identifiers  as  argument 
and  produces  another  relation  which  has  only  those  tuples 
that  are  in  the  first  operand  relation  and  not  in  the  second 
operand  relation.  This  can  be  formally  stated  as: 

R  -  S  ■  {<x,y>  1  <x#y>  e  R  and  not<x,y>e  S  } 

Algorithm  for  Rash-Incidence-Vector  representation  is  given 
below: 

1.  Get  the  relations*  identifiers  and  record  the 
identifier  of  the  first  argument  relation  as  reference. 

2.  Find  the  records  of  the  relations  by  hashing  with 
their  identifiers  to  the  Relation  Table. 

3.  Follow  the  pointers  in  the  PFLM  and  PFRM  fields  of 
the  reference  relation's  record:  find  the  first  LEM  and  first 
RIM  records  of  that  relation. 

4.  Proceed  in  the  LBM  set  of  the  relation:  for  each 
record  found,  do  the  steps  below: 

a.  By  starting  from  the  first  record  in  the  RIM  set 
of  the  reference  relation,  proceed  down  in  the  RIM  set. 


b.  For  each  tuple  found  being  represented  by  the 
records  found  in  steps  4  and  4-a,  hash  with  the  left 
individual  of  the  tuple  into  the  LHT  and  with  the  right 
individual  to  the  RHT  under  the  second  relation's  identifier. 
Test  if  that  tuple  is  already  present  in  the  second  relation 
or  not  (by  using  the  reference  algorithm). 

c.  If  so  do  nothing. 

d.  Else  hash  with  the  left  individual  in  question 
into  the  LHT  and  with  the  right  individual  in  question  to  the 
RHT,  under  the  new  relation's  identifier,  which  is  ”R-S”, 
where  R  is  the  identifier  of  the  reference  relation  and  S  is 
the  identifier  of  the  other  operand  relation.  Establish  the 
LHT  and  RHT  records  of  that  individual  in  the  LHT  and  in  the 
RHT  respectively.  If  they  are  the  first  records  established 
in  this  manner  set  pointer  P  to  the  left  individual's  record 
(in  the  LHT)  and  set  pointer  to  the  right  individual's 
record  (in  the  RHT).  Link  the  records  created  in  this  manner 
to  each  other  as  they  are  created. 

e.  Keep  a  RIM  set  index  count  to  furnish  the  index 
fields  of  the  RIM  records  and  update  it  as  the  records  are 
created.  Keep  LEM  set  cardinality  count  for  finding  out  the 
cardinality  of  the  resulting  LEM  set;  increment  it  when  each 
LEM  record  is  created. 

5.  Repeat  step  4  until  the  LEM  set  of  reference  relation 
is  exhausted. 


6.  Bash  to  the  relation  table  with  the  new  relation's 
identifier  ("R-S"),  establish  its  record,  put  pointer  P  into 
the  P7LM  field,  pointer  Q  into  PFRM  field,  the  last  value  of 
the  L8M  set  cardinality  count  into  the  "|lem|"  field,  and  the 
last  value  of  the  RIM  set  index  count  into  the  "|RIM|"  field 
of  this  record.  Allocate  a  block  of  aeaory  as  large  as: 

<|LBt|*|RIM|)/C 

Where  C  is  the  aeaory  word  length.  Put  the  beginning  address 
of  that  block  into  the  base  field  of  the  relation's  record. 

7.  Start  froa  the  beginning  of  the  LEM  set  of  the  new 
relation,  proceed  down  in  the  LEM  set  of  new  relation  record 
by  record.  By  beginning  with  1  and  increaenting  the  index 
every  tiae  by  the  last  value  of  the  RIM  index  count,  furnish 
the  index  fields  of  the  LEM  records.  In  addition,  for  each 
record  found  do  the  steps  below: 

a.  By  starting  froa  the  first  record  of  the  RIM  set 
of  the  new  relation  proceed  down  in  the  RIM  set  record  by 
record . 

b.  For  each  tuple  found  (being  represented  by  the 
record  pair  found  in  steps  7  and  7-a)  hash  with  the  left 
individual  into  the  LHT  and  with  the  right  individual  to  the 
RET  under  the  reference  relation  and  reference  the  incidence 
vector  location  corresponding  to  that  tuple  by  using  the 
"reference”  algorithm.  If  a  1  is  found  in  the  corresponding 
entry,  reference  the  second  relation  in  the  saae  aanner  to 


check  if  this  tuple  is  also  present  in  the  second  relation. 
If  so  do  nothing;  else  put  1  into  the  corresponding  incidence 
vector  entry  of  the  new  relation. 

We  could  write  an  algorithm  which  is  less  costly  as 
follows:  It  makes  a  separate  copy  of  the  incidence  vector 
and  the  LEM  and  RIM  sets  of  the  reference  relation,  then  it 
goes  through  the  second  relation  and  deletes  the  entries  from 
the  copied  incidence  vector  that  corresponds  to  the  tuples 
found  in  the  second  relation.  But  suppose  that  the  reference 
relation  is  a  very  large  relation  and  the  intersection  of  two 
operand  relations  is  also  very  large  so  that  the  result  of 
the  operation  is  a  relation  that  has  only  a  few  tuples.  As 
can  be  seen,  we  are  allocating  an  incidence  vector  as  large 
as  the  reference  relation's  incidence  vector  for  a  few 
tuples;  we  are  wasting  a  lot  of  storage.  On  the  other  hand 
our  previous  algorithm  uses  an  amount  of  memory  as  large  as 
needed . 

Wow  we  will  do  the  worst  case  asymptotic;  ^  time 
complexity  analysis  of  this  algorithm. 

Clearly  the  algorithm  is  an  expensive  algorithm,  the 
worst  case  complexity  function  can  be  written  as: 

f  •  KMm  *  n)  +  L*(p  *  q)  +  C 

Where  the  first  term  corresponds  to  step  4,  and  the  second 
term  corresponds  to  step  7  of  the  algorithm.  Constant  C  is 
the  number  of  memory  references  made  by  the  other  steps  of 


the  algorithm.  Variable  m  is  the  cardinality  o£  the  LEM  set 
of  relation  R,  variable  n  is  the  cardinality  of  RIM  set  of 
relation  R,  variable  p  is  the  cardinality  of  the  LEM  set  of 
the  resulting  relation,  variable  q  is  the  cardinality  of  RIM 
set  of  resulting  relation.  Let  m*n«p*q  and  T  *  K+L,  then  the 
complexity  function  becomes: 

f  *  T*n*n  +  C 

So  %re  conclude  that  the  algorithm  has  the  worst  case 
asymptotical  time  complexity  behaviour  of  0(n2). 

The  algorithm  for  table  representation  is  a  costly 
algorithm,  as  it  was  before  in  the  other  operations.  The 
dynamics  of  the  system  causes  the  individuals  to  be  out  of 
order  in  the  left  and  the  right  column  of  the  table.  This, 
in  turn,  causes  exhaustive  searches  in  the  columns  of  the 
table.  Even  though  the  deficiency  is  obvious,  we  might 
wonder  how  inefficient  the  algorithm  is  relative  to  the 
previous  algorithm.  The  algorithm  is  given  below: 

1.  Start  from  the  beginning  of  the  table  of  the 
reference  relation.  Make  a  separate  copy  of  that  table. 

2.  Start  from  the  beginning  of  the  other  operand 
relation's  table.  For  each  tuple  found  by  proceeding  through 
the  table,  record  by  record,  search  in  the  copy  of  the 
reference  relation  for  that  tuple;  if  it  is  present  then 
delete  it  from  the  copy  of  the  reference  relation's  table. 


3.  Repeat  step  2  until  the  relation  (other  than  the 
reference  relation)  is  exhausted. 

In  the  worst  case  the  relation  R  and  the  relation  S  may 
be  disjoint  (i.e.,  R-S  *  null  relation).  That  means  for 
every  tuple  found  in  step  2  we  go  through  the  copy  of  the 
reference  relation  exhaustively  (because  there  is  no  tuple  in 
common  the  search  is  unsuccessful  each  time).  Under  this 
circumstance  the  worst  case  complexity  function  can  be 
written  as: 

f  *  K*p  +  L*r*p  +  C 

The  first  term  corresponds  to  step  1 ,  and  the  second  term 
corresponds  to  steps  2  and  3,  of  the  algorithm. 

In  the  above  function,  "p"  is  the  size  of  the  reference 
relation,  "r"  is  the  size  of  the  other  operand  relation  and 
constant  C  is  the  number  of  memory  references  made  by  the 
housekeeping  operations  (such  as  updating  relation  table, 
etc.).  Constants  K  and  L  represents  the  constant  number  of 
memory  references  made  at  each  iteration  of  step  1  and  step 
2,  which  are  expected  to  be  small. 

Let  the  cardinality  of  the  LEM  sets  and  the  RIM  sets  of 
the  relations  be  the  same  and  equal  to  "n".  Let  the 
relations  be  the  universal  relations  on  their  "MEM”  set  (MEM 
*  LEM  |  RIM).  That  means  the  size  of  the  relations  are  the 
same  under  these  assumptions.  The  worst  case  complexity 
function  becomes: 


f  -  K*(n2)  +  L*(n4)  +  C 

So  the  algorithm  has  the  worst  case  asymptotical  time 
complexity  behaviour  of  0(n4).  Assuming  the  absence  of  the 
constant  factors*  160*000  memory  references  are  necessary 
when  n»20.  On  this  basis  we  may  say  that  the  algorithm  is 
practically  inexecutable.  We  have  to  remember  from  previous 
discussions  that  keeping  the  tables  in  sorted  order  is  not  a 
solution  in  the  present  definition  of  the  system. 

Restriction  Operation  (S/R\S)  : 

i 

It  is  sometimes  useful  to  restrict  both  the  domain  and 
the  codomain  of  a  relation.  The  restriction  operation*  given 
an  argument  set  and  the  relation  identifier*  restricts  the 
RIM  set  and  the  LEM  set  of  that  relation  to  the  given  set. 
That  means  the  RIM  set  and  the  LEM  set  of  the  relation  can 
contain  only  those  individuals  that  are  in  the  argument  set. 
We  can  state  it  more  carefully  as  follows: 

S/R \S  »  {<y,x>|  yRx  a  y*s  a  xeS  } 

The  algorithm  for  Hash-Incidence-Vector  representation  is  as 
follows : 

1.  Get  the  relation  identifier  and  the  argument  set 
identifier  and  hash  with  those  identifiers  to  the  relation 
table  and  the  set  table  respectively.  Find  the  records  of 
the  first  right  member  and  the  first  left  member  of  the 
relation  by  following  the  pointers  found  in  the  PFRM  and  the 
PFIM  fields  of  the  relation's  record  respectively. 


Ill 


2.  if  the  argument  set  is  extensionally  represented 
(i.e.,  the  PSS  field  is  not  nil),  start  from  the  beginning  of 
the  linked  list  structure  of  the  RIM  set,  proceed  down  in 
that  linked  list*  For  each  rim  set  record  found  by  following 
the  TASK  links  between  the  records,  hash  with  the  individual 
being  represented  by  that  record  into  the  SHT,  under  the 
argunent  set's  identifier,  to  test  if  there  exists  an  SHT 
record  for  that  individual  in  the  SHT.  If  there  exists  a 
record  for  that  individual  in  the  (SHT)  argument  set,  copy 
that  record  in  the  RHT  under  the  new  relation  identifier: 

(set  identifier) '/'(relation  identif ier)'\'(set  identifier) 
Link  the  RHT  records  to  each  other  as  they  are  created.  Keep 
a  RIM  set  cardinality  count  and  increnent  it  each  tine  a  RIM 
record  is  created  by  beginning  with  0. 

3.  After  the  RIM  set  is  exhausted  do  step  2  for  the  LBM 
set  of  the  relation  in  the  LHT. 

4.  if  the  argunent  set  is  being  intentionally 
represented,  begin  fron  the  beginning  of  linked  structure  of 
the  LBM  set  in  the  LHT.  Proceed  down  in  the  LBM  set  record 
by  record  by  following  the  TASB  links  between  the  records  and 
for  each  RIM  set  record  found  in  this  manner,  test  if  the 
individual  being  represented  by  that  record  is  a  member  of 
the  argument  set.  (This  membership  test  will  be  explained 
later  in  the  discussion  of  the  intensional  representation 
techniques).  If  it  is  a  member,  copy  that  RHT  record  in  the 
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RHT  under  the  new  relation  identifier  indicated  above.  Link 
the  RHT  records  created  in  the  RHT  in  the  above  manner  to 
each  other  as  they  are  created.  After  finishing  with  the  LEM 
set  of  the  relation,  begin  from  the  beginning  of  the  RIM  set 
of  the  relation  and  do  the  same  as  it  was  done  for  the  LEM 
set,  but  this  time  for  the  RIM  set  of  the  relation  in  the 
RHT.  Keep  a  LEM  cardinality  count  and  increment  it  for  each 
LHT  record  created  by  beginning  with  0;  in  the  same  manner 
keep  a  RIM  cardinality  count  beginning  with  0  and  increment 
it  for  each  RHT  record  created  while  performing  the  above 
functions. 

5.  Hash  to  the  relation  table  under  the  new  relation 
identifier  indicated  above  and  establish  the  record  of  the 
new  relation  in  the  relation  table.  Copy  the  BASE  field  of 
the  original  relation's  record  into  the  BASE  filed  of  the  new 
relation's  record.  Put  the  last  value  of  the  RIM  cardinality 
count  into  the  |RIM|  field,  and  puc  the  last  value  of  the  LEM 
cardinality  count  into  the  |LBM|  field  of  the  new  relation's 
record.  Put  the  pointers  to  the  first  records  of  the  LEM  and 
the  RIM  set  of  the  original  relation  into  the  PFLM  and  PPRM 
fields  of  that  record. 

Mow  we  will  do  the  worst  case  asymptotical  time 
complexity  analysis  for  this  algorithm. 

We  will  assume  that  in  the  worst  case  the  argument  set  is 
a  super  set  of  both  the  RIM  set  and  the  LEM  set  of  the 
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original  relation;  in  that  caaa  we  necessarily  copy  the 
whole  RIM  set  and  the  LEM  set  of  the  original  relation  in 
order  to  obtain  the  LEM  set  and  the  RIM  set  of  the  new 
relation.  Me  write  the  worst  case  tiae  coaplexity  function 
of  that  algor itha  as  follows: 

f  *  L*a  ♦  S*n  ♦  C 
where: 

n  ■  The  cardinality  of  the  RIM  set  of  the  original 
relation. 

a  ■  The  cardinality  of  the  LEM  set  of  the  original 
relation. 

L  ■  The  constant  nun  bar  of  aeaory  references  aade  while 
copying  each  UN  set  record. 

S  *  The  constant  number  of  aeaory  references  aade  while 
copying  each  RIM  set  record. 

In  the  above  function  the  second  tera  corresponds  to  step  2, 
the  first  tera  corresponds  to  step  3,  and  the  last  tern 
corresponds  to  steps  1  and  5  of  the  algorithm. 

Let  the  cardinalities  of  the  LEM  and  RIM  sets  of  the 
relation  be  equal  and  T  ■  L+S;  then  the  complexity  function 
becomes; 

f  -  T*n  ♦  C 

So  we  conclude  that  the  worst  case  asyaptotical  tiae 
complexity  behaviour  of  this  algorithm  is  0(n)  where  ”n”  is 


the  coaaon  cardinality  of  the  LEM  and  RIM  sets  of  the 
relation. 

Mow  we  have  to  ask  how  could  we  do  this  operation  on  the 
table  representation.  The  algor itha  is  as  follows: 

1.  Start  froa  the  beginning  of  the  linked  list  structure 
of  the  relation's  table#  and  proceed  down  in  that  linked  list 
record  by  record.  Por  each  record  found  in  this  Banner#  hash 
into  the  SET  with  the  individual  being  represented  by  the 
"right"  field  of  that  table  record  under  the  arguaent  set's 
identifier.  If  there  exists  an  SHT  record  for  that 
individual#  hash  into  the  SHT  with  the  individual  being 
represented  by  the  "left"  field  of  that  table  record  under 
the  arguaent  set's  identifier.  If  there  exists  an  SHT  record 
for  that  individual  also  create  a  new  table  record#  copy  the 
"left"  and  the  "right"  fields  of  the  original  table  record 
into  the  corresponding  fields  of  the  new  table  record.  Link 
the  new  table  records  created  in  this  Banner  to  each  other  by 
their  "link"  fields  as  they  are  created. 

Now  we  will  do  the  worst  case  asyaptotical  tiae 
coaplexity  analysis  of  this  algoritha. 

In  the  worst  case  the  arguaent  set  aay  be  a  super  set  of 
the  LEM  set  and  the  RIM  set  of  the  original  relation.  In 
that  case  we  necessarily  aake  a  separate  copy  of  the  whole 
table  of  the  original  relation  in  order  to  obtain  the  table 
of  the  new  relation.  Proa  that  point  on  the  worst  case  tiae 


complexity  analysis  of  the  above  algorithm  is  exactly  the 
same  as  the  worst  case  time  complexity  analysis  of  the 
algorithm  for  the  "Right  Restriction"  operation  that  works  on 
the  table  representation.  So  the  reader  should  refer  to  the 
analysis  done  in  the  "Right  Restriction”  operation  case. 

Dual  Application  (R  #  S)  : 

This  operation  takes  two  relation  identifiers  and 
constructs  a  new  relation  which  relates  the  individuals  with 
the  pairs.  Given  a  pair  in  relation  with  an  individual  under 
the  resulting  relation#  the  left  individual  of  this  pair  is 
the  result  of  application  of  the  first  argument  relation  (R) 
to  this  individual  and  the  right  individual  of  this  pair  is 
the  result  of  application  of  the  second  argument  relation  (S) 
to  this  individual.  So  the  resulting  relation's  right 
members  set  consists  of  individuals  and  the  left  members  set 
consists  of  pairs. 

The  algorithm  for  Hash-Incidence-Vector  representation  is 
as  follows: 

Let  R  be  the  first  argument  relation  and  S  be  the  second 
argument  relation. 

1.  Find  the  records  of  the  relations  in  the  relation 
table  by  hashing  with  the  relation  identifiers  to  the 
relation  table. 

2.  follow  the  pointer  in  the  PFRH  field  of  the  relation 
R's  record;  find  the  first  right  member's  record  in  the  RHT. 


3.  Proceed  in  the  RXM  set  of  the  relation  R  by  following 
the  TASB  links  between  the  records;  for  each  RXM  set  record 
found  in  this  manner  do  the  steps  below: 

a.  Hash  with  the  individual  being  represented  by  the 
current  RIM  set  record  to  the  RHT  under  the  relation  8.  If  a 
record  for  that  individual  is  also  present  in  the  RXM  set  of 
relation  S.  Hash  into  the  RHT  with  the  individual  in 
question  under  the  new  relation  identifier ,  "R#S".  Establish 
a  record  for  that  individual;  if  this  is  the  first  record 
established  in  RHT  for  the  new  relation  nark  it  with  pointer 
P. 

b.  Apply  relation  R  to  the  individual  in  question  by 
calling  the  "apply”  algorithm;  record  the  pointer  returned  in 
variable  "tempi”. 

c.  Apply  relation  S  to  the  individual  in  question  by 
calling  the  "apply"  algorithm;  record  the  pointer  returned  in 
pointer  variable  ”tenp2”. 

d.  Follow  the  pointer  recorded  in  variable  "tempi" 
and  find  the  individual  resulting  from  the  application  of  R 
to  the  current  right  individual.  Xn  the  same  manner,  follow 
the  pointer  recorded  in  variable  ”temp2”  and  find  the  second 
individual  resulting  from  the  application  of  the  relation  S 
to  the  right  individual  in  question. 


•.  Call  algorithm  "pair"  with  thosa  individuals  and 
record  tha  pointar  to  tha  racord  of  tha  "pair”  ralation 
resulting,  in  tha  variable  "ta*p3". 

£.  Hash  to  tha  LHT  undar  tha  naw  ralation  idantifiar 
■R#S",  astablish  a  LEM  racord  of  that  ralation  in  LHT,  put 
tha  pointar  racordad  in  pointar  variable  "ta»p3"  into  tha  PML 
fiald  of  that  racord.  If  it  is  tha  first  LEM  racord  craatad 
in  this  manner,  nark  it  with  pointar  Q.  Sat  tha  PRLM  link  of 
tha  currant  RIM  sat  racord  to  tha  currant  LEM  sat  racord 
craatad  above.  Link  tha  racords  craatad  in  tha  LHT  and  RHT 
for  tha  naw  ralation  to  aach  othar  by  thair  TASE  links  as 
thay  ara  craatad  (axcapt  tha  "dummy”  racords).  Reap  a  RIM 
indax  count  and  LEM  cardinality  count  for  aach  RIM  sat  racord 
craatad,  incramant  tha  RIM  indax  count  and  put  tha  updatad 
value  of  tha  RIM  indax  count  into  tha  indax  fiald  of  tha  RIM 
sat  racord  craatad.  For  aach  LEM  racord  craatad  incramant 
tha  LEM  cardinality  count.  (*  According  to  our  convention  of 
establishing  indices  in  tha  LEM  racords  wa  can  not  astablish 
tha  indices  in  tha  LEM  racords  until  after  tha  cardinality  of 
tha  RIM  sat  of  tha  relation  becomes  evident.  *) 

4.  Start  from  tha  beginning  of  LEM  sat  of  tha  ralation 
by  following  tha  pointar  Q,  for  aach  racord  found  by 
following  tha  TASE  links  between  tha  racords,  incramant  tha 
LEM  indax  count  by  tha  last  value  of  tha  RIM  indax  count  (by 


beginning  from  1)  end  put  the  updated  value  of  the  LEM  index 
count  into  the  index  field  of  the  record  in  question. 

5.  Allocate  a  block  of  aeaory  as  large  as: 

(LEM-cardinality-count*RIM-index-count)/C 

where  C  is  the  aeaory  word  length. 

Initialise  that  vector  to  all  seros  (*  This  aay  turn  out  to 
be  the  costly  part  of  the  algoritha  if  we  are  not  using 
pipelining.  *) 

6.  Bash  to  the  relation  table  with  the  new  relation's 
identifier  "R#S",  establish  its  record  and  put  the  pointers  P 
and  Q  into  the  PFRM  and  PFLM  fields  of  that  record 
respectively.  Put  the  beginning  address  of  the  incidence 
vector  allocated  in  step  5  into  the  BASE  field,  put  the  LBM 
cardinality  count  into  the  (LSMf  field,  and  put  the  RIM  index 
count  into  the  |rim|  field  of  that  record  respectively. 

7.  Start  froa  the  beginning  of  the  RIM  set  of  new 
relation.  For  each  RIM  set  record  found  by  following  the 
TA3B  links  between  the  records  get  the  index  of  that  record. 
Follow  the  pointer  found  in  the  PRLM  field  of  this  record  and 
find  the  record  of  the  left  individual  that  is  in  relation 
with  the  current  right  individual  in  LHT.  Get  the  index  of 
that  record,  reference  the  incidence  vector  with  those 
Indices  by  calling  the  algoritha  "reference”;  and  put  1  into 
the  corresponding  incidence  vector  entry. 


Mow  we  will  do  the  worst  case  asyaptotical  tiae 


coaplexity  analysis  of  this  algoritha. 

The  RIM  set  of  the  resulting  relation  is  in  fact  the 
intersection  of  the  RIM  sets  of  the  arguaent  relations.  So 
in  the  worst  case  the  cardinality  of  the  RIM  set  of  the 
resulting  relation  aay  be  as  large  as  the  cardinality  of  the 
RIM  set  of  the  relation  R.  In  addition  to  that,  in  the  worst 
case  each  right  individual  of  the  resulting  relation  aay 
correspond  to  a  unique  left  individual.  That  aeans  we  create 
a  pair  relation  for  each  right  individual  found  to  be  in  the 
RIM  set  of  the  resulting  relation.  That  also  aeans  that  the 
cardinality  of  the  RIM  set  of  the  resulting  relation  is  equal 
to  the  cardinality  of  the  LBM  set.  Under  these 
considerations,  we  write  the  worst  case  tiae  coaplexity 
function  as: 

f  -  8*n  +  T*n  +  U*(n*n)/D  ♦  V*n  +  C 
where  n  is  the  cardinality  of  the  RIM  set  of  relation  R, 
constant  D  is  the  aeaory  word  length,  constant  C  is  the 
nuaber  of  aeaory  references  aade  in  steps  1,  2  and  6.  In  the 
above  function:  First  tera  corresponds  to  step  3,  second 
tera  corresponds  to  step  4,  third  tera  corresponds  to  step  5, 
fourth  tera  corresponds  to  step  7  of  the  algoritha.  In  the 
third  tera  the  expression: 

(n  x  n)/D 
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stands  for  the  size  of  the  incidence  vector  in  terns  of  the 
number  of  nenory  locations  occupied.  As  we  mentioned  above# 
in  the  worst  case  the  cardinality  of  the  RIM  set  and  the 
cardinality  of  the  LEM  set  of  the  resulting  relation  are 
equal  to  the  cardinality  of  the  RIM  set  of  elation  R  (which 
is  *n»). 

Let: 

Z  ■  S  +  T  +  V, 
and  W  -  0/D 

then  the  complexity  function  becomes: 

f  *  W*(n2)  ♦  z*n  ♦  C 

So  we  conclude  that  the  worst  case  asymptotical  time 
complexity  behaviour  of  this  algorithm  is  0(n2). 

Now  we  have  to  think  about  how  the  operation  could  be 
performed  on  the  table  representation.  The  algorithm  for  the 
table  representation  is  simpler  than  the  previous  algorithm. 
The  algorithm  is  as  follows: 

Let  R  be  the  first  argument  relation  and  S  be  the  second. 

1.  Start  from  the  beginning  of  the  relation  S's  table 
and  proceed  down  in  the  table  record  by  record  by  following 
the  links  between  the  table  records.  For  each  record  found# 
extract  the  right  individual  and  hash  with  that  individual 
into  the  SCBT#  establish  a  SCHT  record  for  that  individual  in 
SCHT#  extract  the  left  individual  and  create  another  SCHT 
record  for  that  individual#  and  link  the  record  of  the  right 


individual  to  that  record  by  its  TASK  link.  If  after  hashing 
with  that  right  individual*  a  SCHT  record  of  that  individual 
is  found  to  be  present  in  SCHT*  do  nothing.  Link  the  records 
created  in  the  SCOT  to  each  other  by  their  TASB  links. 
(*  X.e.*  establish  the  SCHT  record  of  a  right  individual  and 
the  record  of  the  corresponding  left  individual  in  SCHT  only 
once.  *)  (*  As  the  result  of  the  execution  of  step  1*  each 
right  individual  has  a  record  in  SCHT  and  is  followed  by  the 
record  of  the  left  individual  which  would  have  resulted  froa 
application  of  relation  S  to  that  right  individual.  Note 
that  the  SCHT  record  created  for  the  left  individual  is  not 
connected  to  any  SCHT  entry.  *) 

2.  Start  froa  the  beginning  of  relation  R's  record. 
Proceed  in  the  table  of  the  relation  R.  Por  each  table 
record  of  R  found  in  this  wanner*  extract  the  right 
individual*  hash  with  that  right  individual  into  the  SCHT. 
If  a  record  of  that  individual  is  not  already  present  in  SCHT 
then  do  nothing*  and  continue  with  the  next  table  record  in 
R.  Otherwise  create  a  new  table  record*  put  the  pointer  to 
the  right  individual  in  question  into  the  "right”  field  of 
that  record*  and  create  another  relation  with  only  one  table 
record.  Bxtract  the  pointer  to  the  left  individual  froa  the 
"left"  field  of  the  current  table  record  in  R*  put  that 
pointer  into  the  "left"  field  of  the  new  relation's  (pair) 
record*  follow  the  TASB  link  of  the  SCHT  record  of  the  right 
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individual  in  question  and  £ind  the  next  SCHT  record  in  the 
SCOT.  Extract  the  pointer  to  the  left  individual  (that  was  in 
the  "left”  field  of  the  relation  S's  table  record  before) 
from  that  record  and  in  the  same  manner  put  that  pointer  into 
the  "right”  field  of  the  pair  relation's  record.  Establish 
the  record  of  the  pair  (a  singleton  relation)  in  the  relation 
table.  Put  the  pointer  to  this  record  into  the  "left”  field 
of  the  new  table  record  created  for  the  resulting  relation 
above.  Delete  the  SCHT  record  for  the  right  individual  in 
question#  and  the  following  SCOT  record  (which  belongs  to  the 
left  individual  that  is  In  relation  with  the  right  individual 
in  question  under  the  relation  S)  from  the  SCHT.  update  the 
TASK  links  between  the  SCHT  records  appropriately. 
(*  Because  there  is  no  need  to  create  a  table  record  of  the 
resulting  relation  again  for  that  right  individual.  *)  Link 
the  resulting  relation's  records  to  each  other  as  they  are 
created  by  their  link  fields. 

Now  we  will  do  the  worst  case  asymptotical  time 
complexity  analysis  of  this  algorithm. 

As  we  see  the  algorithm  is  simpler  than  the  previous 
algorithm.  The  complexity  function  of  that  algorithm  is  as 
follows  t 

f  -  K*pl  +  L*p2  +  C 

We  know  that  in  the  worst  case  the  sizes  of  the  relations  are 
equal  to  the  product  of  the  cardinalities  of  their  LEM  and 


RIM  sets.  Since  we  have  accepted  the  LEM  set  and/or  the  RIM 
set  cardinality  as  a  measure,  we  have  to  write  the  function 
in  terms  of  these  cardinalities.  Before  doing  that  we  have 
to  explain  the  meanings  of  the  constants  and  variables  in  the 
above  function.  The  variable  "pi”  is  the  size  of  the 
relation  S,  variable  ”p2"  is  the  size  of  the  relation  R,  the 
constant  K  is  the  number  of  memory  references  made  for  each 
iteration  in  step  1,  and  in  the  same  sense,  the  constant  L  is 
the  number  of  memory  references  made  for  each  iteration  in 
step  2.  In  the  above  function,  the  first  term  represents  the 
step  1,  and  the  second  term  represents  the  step  2  of  the 
algorithm.  The  constant  C  is  the  number  of  memory  references 
made  by  the  housekeeping  operations  such  as  updating  the 
relation  table. 

Let: 

pi  «  p2  ■  n*n 

where  ”n”  is  the  common  cardinality  of  the  LEM  and  RIM  sets 
of  the  argument  relations,  and  let: 

Z  -  K+L 

Then  the  complexity  function  becomes: 
f  *  Z*n*n  +  C 

So  we  conclude  that  the  worst  case  asymptotical  time 
complexity  behaviour  of  that  algorithm  is  0(n2). 


Parallel  Application  (r||S)  : 

This  operation  takes  two  relation  identifiers  and 
constructs  a  new  relation,  in  which  each  left  member  and  each 
right  member  is  a  pair.  Given  a  tuple  of  this  relation,  the 
left  component  is  a  pair,  in  which  the  left  individual  of  the 
pair  is  the  result  of  application  of  the  first  argument 
relation  to  the  left  individual  of  the  right  component  (which 
is  also  a  pair)  of  the  tuple.  Analogously,  the  right 
individual  of  the  left  member  (which  is  a  pair)  is  the  result 
of  application  of  the  second  argument  relation  to  the  right 
individual  of  the  right  component. 

This  algorithm  is  naturally  more  complex  than  the 
previous  algorithms.  The  algorithm  for  Hash-Incidence-Vector 
representation  is  as  follows: 

Let  the  first  argument  relation  be  R,  and  the  second 
argument  relation  be  S. 

1.  Start  from  the  beginning  of  RIH  set  of  relation  R  and 
proceed  down  in  the  RIM  set  record  by  record.  Por  each  RIM 
set  member  found  in  this  manner  do  the  steps  below: 

a.  Start  from  the  beginning  of  RIM  set  of  the 
relation  S  and  find  each  record  representing  a  RIM  set 
individual  in  turn  by  following  the  TASE  links  between  the 
records  of  the  RIM  set. 

b.  For  each  pair  of  individuals  found  in  step  1  and 
step  1-a  (i.e.,  we  find  the  records  that  represent  the 


individuals  then  we  extract  the  individuals  by  following  the 
pointers  in  their  PML  fields)/  call  the  algorithm  "pair"  with 
the  individual  obtained  in  step  1  as  the  first  argument 
individual  and  the  individual  obtained  in  step  1-a  as  the 
second  argument  individual.  (*  Mote  that  the  algorithm 
"pair”  first  looks  in  the  relation  table  for  that  "pair" 
relation  and  executes  this  operation  only/  if  this  relation 
has  not  been  created  before.  *) 

c.  Hash  to  the  RHT  with  the  identifier  of  the  pair 
constructed  by  "pair”  operation  (say  ”(X/y)"  under  the 
relation  identifier  "R||S”.  Create  a  RHT  record  and  put  the 
pointer  to  the  record  of  the  pair  relation  established  in 
relation  table  into  the  PML  field  of  that  record.  Every  time 
a  RIM  set  record  is  created/  put  the  updated  value  of  the  RIM 
set  index  count  into  the  index  field  of  that  record.  If  the 
record  is  the  first  RIM  record  created  in  this  manner  mark  it 
with  pointer  "2". 

d.  Apply  the  relation  R  to  the  individual  obtained 
in  step  1  by  calling  the  "Function  application"  algorithm. 
Call  the  individual  returned  by  this  algorithm  W. 

e.  Apply  the  relation  S  to  the  individual  obtained  in 
step  1-a  by  calling  the  "Function  application”  algorithm. 
Call  the  individual  returned  Y. 

f.  Repeat  step  1-b  for  the  individuals  "W"  and  "Y". 


g.  Repeat  step  1-c  for  the  pair  "(W,Y)"  on  the  LHT, 
but  this  time  do  not  furnish  the  index  fields  of  the  records. 
Keep  a  LEM  set  cardinality  count  and  increment  it  for  each 
record  created.  Mark  the  beginning  record  of  the  LEM  set 
with  the  pointer  U.  (*  In  this  case  duplications  of  the  LEM 
records  may  occur.  To  prevent  this,  our  algorithm  does  not 
create  the  LEM  record  of  a  pair  if  there  is  already  a  LEM 
record  for  that  pair  in  the  LEM  set  of  the  new  relation. 
This  is  a  property  of  the  hashing  mechanism.  *)  Set  the  PRLM 
link  of  the  current  RIM  record  of  the  new  relation  to  the  LEM 
record  of  the  left  individual  constructed  in  steps  1-d 
through  1-g.  (*  The  result  of  one  iteration  of  step  1  yields 
the  establishment  of  one  right  member  and  one  left  member  of 
the  new  relation  "R||S"  that  are  in  relation  with  each  other 
under  this  relation.  The  result  of  exhaustive  execution  of 
step  1  is  the  creation  of  complete  LEM  and  RIM  sets  of  new 
relation.  *) 

2.  Start  from  the  beginning  of  the  LEM  set  of  new 
relation,  and  proceed  down  in  the  LEM  set  record  by  record  by 
keeping  an  index  count  and  incrementing  it  by  the  last  value 
of  the  RIM  set  index  count  for  each  record  passed.  Put  the 
updated  count  into  the  index  field  of  each  record  passed. 

3.  Allocate  a  block  of  memory  as  large  as: 

( LEM-car d i nal i ty-coun t  *RIM- i nde x-coun t ) /C 
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where  C  is  the  memory  word  length.  Initialize  that  memory 
block  to  all  zeros. 

4.  Hash  to  the  relation  table  under  the  relation 
identifier  R||S,  create  a  new  RT  record,  put  pointers  Z  and  u 
into  the  PFRM  and  PFLM  fields  of  that  record  respectively, 
put  the  beginning  address  of  the  memory  block  allocated  in 
step  3  into  the  BASS  field,  and  put  the  LBN  cardinality  count 
and  RIM  index  count  into  the  |lem|  and  [rim|  fields  of  that 
record  respectively. 

5.  Start  from  the  beginning  of  the  RIM  set  of  new 
relation  and  proceed  down  in  the  RIM  set  record  by  record. 
For  each  record  found: 

a.  Extract  the  index  field  of  the  record. 

b.  Follow  the  pointer  in  the  PRLM  link  field  of  that 
record  and  find  the  record  of  the  left  individual  (pair)  in 
relation  with  the  current  right  individual.  Extract  the 
index  field  of  that  record. 

c.  Reference  the  new  relation's  incidence  vector  with 
those  indices  obtained  in  steps  5-a  and  5-b  and  put  1  into 
the  corresponding  incidence  vector  entry. 

Mow  we  will  do  the  worst  case  asymptotical  time 
complexity  analysis  of  this  algorithm. 

The  complexity  function  of  this  algorithm  can  be  written 
as  follows: 

f  *  H*nl*n2  +  T*ml*m2  +  (R*nl*n2*ml*m2)/D  +  0*nl*n2  +  C 


in  which  the  first  tern  corresponds  to  step  1,  second  term 
corresponds  to  step  2,  third  term  corresponds  to  step  3, 
fourth  term  corresponds  to  step  5  of  the  algorithm.  Constant 
C  refers  to  the  constant  number  of  memory  references  made  by 
the  other  steps  of  the  algorithm.  In  the  above  function: 
ml  *  the  cardinality  of  the  L2M  set  of  relation  R, 

nl  *  the  cardinality  of  the  RIM  set  of  relation  R, 

M2  «  the  cardinality  of  the  LEM  set  of  relation  S, 

n2  ■  the  cardinality  of  the  RIM  set  of  relation  S. 

In  the  above  function  the  second ,  third  and  fourth  terms  may 
not  be  clear  to  the  reader.  First  of  all  we  are  creating 
records  for  all  possible  pairs  that  can  be  constructed  from 
the  RIM  set  individuals  of  the  argument  relations  and  each 
one  of  those  pairs  becomes  a  right  individual  of  the  new 
relation.  Hence  the  cardinality  of  the  RIM  set  of  new 
relation  i as 

| RIM |  ■  nl*n2  (nl  and  n2  are  as  defined  above) 

This  may  not  be  true  for  the  cardinality  of  LEM  set  of  the 
new  relation  since  two  or  more  RIM  set  individuals  of  the  new 
relation  may  be  in  relation  with  one  LEM  set  individual.  But 
in  the  worst  case  the  cardinality  of  the  LEM  set  becomes  the 
product  of  the  cardinalities  of  the  LEM  sets  that  belong  to 
argument  relations.  That  means: 

| LEM |  *  ml*m2  (ml  and  m2  are  as  defined  above.) 


We  know  that  tha  size  of  the  incidence  vector  is  computed  by 
the  formula : 

size  *  (|lem|*|rim|)/d 

where: 

0  *  Meaory  word  length 

So  the  size  of  the  incidence  vector  can  be  written  in  terms 
of  the  cardinalities  of  the  LEM  sets  and  the  RIM  sets  of  the 
argument  relations  as: 

SIZE  -  (nl*n2*al*a2)/D 

So  while  we  are  establishing  the  indices  of  the  new 
relation's  LEM  records  in  step  2,  we  make  a  number  of  meaory 
references  proport ional  to: 
al*a2 

and  while  we  are  initializing  the  incidence  vector  we  make  a 
number  of  meaory  references  proportional  to: 

(nl*n2*al*m2)/o 

In  the  same  sense,  while  we  are  establishing  the  l's  in  the 
new  incidence  vector  we  are  making  a  number  of  meaory 
references  proportional  to: 

(nl*n2) 

Now  let  nl«n2«ml*a2  -  n,  Z  »  (N+T+U)  and  H*R/D,  then  the 
complexity  function  of  the  algorithm  can  be  written  as: 
f  -  H*(n4)  ♦  Z*(n2)  +  C 

The  algorithm  is  terribly  expensive.  The  reason  is  we  had  to 
initialize  the  incidence  vector  of  the  resulting  relation 


which  a  ad*  tha  algorithm  an  order  four  algorithm.  But  avan 
in  tha  abaanca  of  this  tarm  (initialisation),  tha  algorithm 
is  tarribly  expensive  bacausa  tha  constant  Z  is  axpactad  to 
ba  so  larga.  In  addition  to  that  tima  daficiancy,  tha 
algorithm  is  storaga  inafficiant.  We  have  to  construct  tha 
huga  incidanca  vactor  of  tha  rasulting  ralation,  bacausa  tha 
relation  may  participate  in  subsequent  operations.  So  tha 
algorithm  is  practically  infeasible.  Fortunately  the 
intensional  algorithms  for  this  operation  are  cheap  so  we  can 
do  this  operation  intensionally. 

Now  we  have  to  define  tha  algorithm  for  tha  table 
representation.  In  this  algorithm  we  will  use  two  scratch 
hash  tables,  SCBTl  and  SCHT2,  to  make  tha  algorithm  easier  to 
understand.  Of  course  the  algorithm  can  be  defined  by  using 
only  one  scratch  hash  table  and  by  using  a  good  collision 
handling  policy,  but  that  makes  tha  algorithm  very  complex  to 
understand.  Tha  steps  of  tha  algorithm  are  as  follows: 

Let  R  ba  tha  first  argument  ralation  and  S  be  the  other. 

1.  Do  step  1  of  the  algorithm  given  for  the  table 
representation  in  the  dual  application  operation,  on  relation 
S  on  the  SCBTl. 

2.  Do  step  1  of  tha  algorithm  given  for  the  table 
representation  in  tha  dual  application  operation,  on  relation 
R  on  tha  SCHT2. 
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3*  Start  from  the  beginning  of  the  linked  list 
constructed  in  SCBT2  and  proceed  down  in  that  linked  list  by 
skipping  the  records  between  the  right  individuals'  records. 
For  each  right  individual  record  found  in  this  manner  begin 
from  the  beginning  of  the  linked  list  constructed  in  SCHT1 
and  proceed  down  in  that  linked  list  by  skipping  the  left 
individuals'  records  between  the  right  individuals'  records. 

a.  For  each  pair  of  right  individuals  found  above 
(one  of  the  right  individuals  belongs  to  R  and  the  other 
right  individual  belongs  to  S)  call  algorithm  "pair"  with 
those  individuals  as  arguments.  Create  a  new  table  record  of 
the  resulting  relation,  and  put  the  pointer  to  the  table  of 
the  relation  (pair)  constructed  by  algorithm  "pair"  into  the 
•right*  field  of  that  record. 

b.  Find  the  SCBT  record  immediately  following  the 
right  individual's  record  in  SCBT2  (which  belongs  to  the  left 
individual  that  is  in  relation  with  that  right  individual 
under  the  relation  R) .  Extract  the  individual  being 
represented  by  that  record,  find  the  SCBT  record  immediately 
following  the  right  individual's  record  in  SCBT1  (which 
belongs  to  the  left  individual  that  is  in  relation  with  that 
right  individual  under  the  relation  S),  and  extract  the 
individual  being  represented  by  that  record. 

c.  Call  algorithm  "pair*  with  the  individuals  found 
in  step  3-b  as  arguments  and  establish  the  pointer  to  the 


resulting  relation  constructed  by  algorithm  "pair”  in  the 
"left"  field  of  the  new  relation's  record  created  in  step  3-a, 
4.  Link  the  records  of  the  new  relation  created  above  to 
each  other ,  by  their  link  fields  as  they  are  created. 

Mow  we  will  do  the  worst  case  asymptotical  time 
complexity  analysis  of  this  algorithm. 

As  can  be  seen  the  algorithm  is  surprisingly  simple  and 
efficient  relative  to  our  previous  algorithm.  In  fact  the 
cost  of  that  algorithm  is  much  less  than  the  cost  of  the 
previous  algorithm.  The  worst  case  time  complexity  function 
of  that  algorithm  can  be  written  as: 

f  *  K*pl  +  L*p2  +  M*nl*n2  +  c 
where: 

pi  *  The  sise  of  the  relation  S , 

p2  -  The  sise  of  the  relation  R,. 

nl  ■  The  cardinality  of  the  RIM  set  of  relation  S, 

n2  ■  The  cardinality  of  the  RIM  set  of  relation  Rr 

and/  the  first  term  corresponds  to  step  1/  the  second  term 
corresponds  to  step  2,  the  third  term  corresponds  to  steps  3 
and  4  of  the  algorithm.  The  constant  C  represents  the  number 
of  memory  references  made  by  the  housekeeping  operations. 
Let  the  cardinalities  of  all  the  LEM  and  RIM  sets  of  the 
argument  relations  be  the  same  and  equal  to  "n".  We  know 
that  in  the  worst  case: 
pi  •  p2  ■  n*n 


Under  these  considerations#  the  above  complexity  function 
becomes s 

f  ■  I*n*n  ♦  C 
where s 

*  -  I  ♦  L  ♦  N 

So  we  conclude  that  the  cost  of  that  algorithm  is  much  less 
than  the  cost  of  our  previous  algorithm#  and  the  algorithm 
has  the  worst  case  asymptotical  time  complexity  behaviour  of 
0(n2).  Our  previous  algorithm  is  expensive  because  of  the 
initialisation  of  the  incidence  vector.  Suppose  the 
initialised  memory  block  for  the  incidence  vector  is  always 
available#  and  we  do  not  have  to  initialise  the  incidence 
vector.  Even  under  this  condition  our  previous  algorithm#  by 
having  a  huge  constant  in  front  of  the  second  degree  term#  is 
more  expensive  than  this  algorithm.  We  want  to  point  out 
that  this  is  the  first  operation  for  which  the  table 
representation  allowed  us  to  define  a  more  efficient 
algorithm  than  the  Bash-Incidence-Vector  representation 
allowed.  The  weakness  of  the  Hash-Incidence-Vector 
representation  is  that  it  requires  the  incidence  vector  to  be 
initialised;  on  the  other  hand  the  incidence  vector  is 
structured  enough  to  be  pipelined  especially  for 
initialisation.  So  for  small  relations  with  small  incidence 
vectors#  this  deficiency  is  omi table. 


First  Ancestral  (fan:R)  t 


This  operation  takes  a  relation  identifier  and  produces 
the  reflexive,  transitive  closure  of  that  relation;  the 
resulting  relation  is  also  called  the  first  ancestral  of  the 
given  relation. 

Because  the  resulting  relation  will  be  reflexive,  it  will 
contain  those  tuples  in  which  the  left  individual  and  the 
right  individual  are  the  same  in  addition  to  the  tuples 
obtained  by  performing  second  ancestral  operations  on  the 
original  relation.  This  also  implies  that  the  LBN  set  and 
the  RIM  set  of  the  resulting  relation  will  be  equal  to  the 
MEN  set  of  the  original  relation.  Mote  that  in  this  case 
Marshall's  algorithm  can  be  applied  without  modification. 

The  algorithm  for  Bash- Incidence-Vector  representation  is 
as  follows; 

1.  Find  the  record  of  the  relation  in  the  relation  table 
by  hashing  with  the  relation  identifier  to  the  relation 
table. 

2.  Follow  the  PFLM  field  of  that  record  and  find  the 
first  left  member's  record. 

3.  By  following  the  TASE  links  between  the  records, 
proceed  down  in  the  LEM  set.  For  each  record  found  in  this 
manner,  do  the  steps  below: 

a.  Extract  the  individual  represented  by  that 

record. 
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b.  Bash  to  the  LHT  and  the  RHT  with  that  individual 
under  the  new  relation's  identifier,  "fan:R";  establish  both 
the  LEM  record  and  the  RIM  record  for  that  individual  in  the 
LHT  and  in  the  RHT  respectively.  If  these  are  the  first 
records  created  in  this  manner,  mark  them  with  pointer  P  and 
Q  respectively.  If  after  hashing  to  the  LHT/RHT,  a  record  of 
that  individual  is  found  to  be  present  in  the  LHT/RHT,  do  not 
create  LHT/RHT  records  for  that  individual.  Link  the  records 
created  in  this  manner  by  their  TASE  links  in  the  LHT  and  in 
the  RHT. 

4.  Continue  with  the  RIN  set  of  the  original  relation  by 
beginning  from  the  beginning  of  the  RIM  set  (i.e.,  go  to  step 
3). 

5.  In  steps  3  and  4,  keep  a  right  members  index  count 
for  the  RIM  set  being  constructed  and  increment  it  for  each 
record  created.  Put  the  updated  value  of  that  count  into  the 
index  field  of  the  record  created  most  recently.  After  the 
execution  of  steps  3  and  4,  start  from  the  beginning  of  the 
LEM  set  of  the  resulting  relation  and  keep  a  LEM  set  index 
count.  Beginning  from  1,  increment  it  by  the  last  value  of 
the  RIM  set  index  count  for  each  record  of  the  LEM  set  passed 
and  put  the  updated  value  of  this  count  into  the  "index" 
field  of  the  current  record. 


6.  Allocate  a  block  of  memory  for  the  new  incidence 


vector  as  large  as: 

(RIM-set-index-count) 2 
C 

where  C  is  the  memory  word  length.  Save  the  beginning 
address  of  this  block. 

7.  Hash  to  the  relation  table  under  the  new  relation's 
identifier,  "fan:R",  where  R  is  the  identifier  of  the 
original  relation.  Establish  the  record  of  that  relation  in 
the  relation  table  and  put  pointer  P  and  Q  into  the  PFLM  and 
the  PFRM  fields  of  that  record  respectively.  Put  the  last 
value  of  the  RIM  set  index  count  into  the  ILEM  |  and  the  |RIM| 
fields  and  put  the  beginning  address  of  ‘the  incidence  vector 
into  the  BASE  field  of  that  record. 

8.  Start  from  the  beginning  of  the  LEM  set  of  the 
original  relation  and  proceed  down  in  the  LEM  set  record  by 
record.  For  each  record  found  do  these  steps: 

a.  Call  the  left  individual  being  represented  by  the 
current  LEM  set  record  (found  above)  X.  Reference  the  new 
relation  with  the  tuple  <X,X>  and  put  1  into  the  incidence 
vector  entry  found. 

b.  Start  from  the  beginning  of  the  RIM  set  of  the 
original  relation  and  find  the  records  of  the  RIM  set  in  turn 
by  following  the  TASE  links  between  the  records. 

c.  For  each  tuple  represented  by  the  pair  of  records 
found  in  steps  8  and  8-b,  reference  the  original  relation's 


incidence  vector  and  reference  the  new  relation.  Find  the 
corresponding  incidence  vector  entry  and  copy  the  entry  of 
the  original  incidence  vector  to  the  new  incidence  vector's 
corresponding  entry. 

9.  Execute  step  6  of  the  second  ancestral  operation's 
algorithm  on  the  new  incidence  vector.  But  in  this  case  let 
the  | RIM |  represent  the  cardinality  of  the  RIM  set  of  the  new 
relation  instead  of  the  cardinality  of  the  RIM  set  of  the 
original  relation  (and  similarly  for  the  |LEM|). 

Now  we  will  do  the  worst  case  asymptotical  time 
complexity  analysis  of  this  algorithm. 

In  the  worst  case  the  relation  may  be  a  universal 
relation  on  its  MEM  set;  that  means  the  LEM  and  RIM  sets  of 
the  relation  are  exactly  the  same  as  the  MEM  set  of  the 
relation.  So  the  LEM,  RIM  and  MEM  sets  of  the  relation  have 
the  common  cardinality  n.  Under  this  consideration  we  write 
the  worst  case  time  complexity  function  of  the  algorithm  as 
follows : 

f  «  2*K*n  +  M*(n2)  +  N*(n2)*(n/C)  +  D 
where: 


n  -  The  common  cardinality  of  the  RIM  and  LEM  sets  of  the 
original  relation. 

C  *  The  memory  word  length. 

K  »  The  constant  number  of  memory  references  made  in  step 
3  (or  4)  for  each  record  copied. 


M  *  The  constant  number  of  memory  references  made  while 
copying  each  bit  of  the  incidence  vector. 

N  »  The  constant  number  of  memory  references  made  in  each 
iteration  of  the  outer-most  "for"  loop  in  step  6  of 
the  second  ancestral  algorithm. 

D  ■  The  constant  number  of  memory  references  made  by  the 
housekeeping  operations. 

In  the  above  function  the  first  term  corresponds  to  steps  3, 
4  and  5,  the  second  term  corresponds  to  step  8,  the  third 
term  corresponds  to  step  9,  and  the  fifth  term  corresponds  to 
the  remaining  steps  of  the  algorithm. 

The  term  which  constitutes  the  asymptotical  time 
complexity  behaviour  of  the  algorithm  is  the  third  term, 
which  is  the  same  as  the  term  given  in  the  complexity 
function  of  the  second  ancestral  algorithm  for  the 
corresponding  step  (step  6) . 

Now,  let  U  *  2*k  and  T  *  N/C?  then  the  complexity 
function  becomes: 

f  ■  T*(n3)  +  M*(n2)  +  U*n  +  D 

So  by  looking  at  the  term  with  the  largest  exponent  we 
conclude  that  the  algorithm  has  the  worst  case  asymptotical 
time  complexity  behaviour  of  0(n3).  This  algorithm  is 
slightly  more  expensive  than  the  algorithm  for  second 
ancestral  operation,  as  it  is  expected,  but  both  algorithms 
have  the  same  asymptotical  time  complexity  behaviour.  We 


want  to  point  out  that  the  constant  of  the  second  tern  is 
larger  than  the  constant  of  the  second  term  of  the  second 
ancestral  operation's  complexity  function.  The  reason  is,  in 
this  case  we  are  also  obtaining  the  reflexive  closure  of  the 
relation.  The  sane  holds  for  the  third  terms. 

As  we  know,  while  we  are  taking  the  reflexive  closure  of 
a  relation,  we  have  to  add  those  tuples  to  the  relation  which 
have  the  property  that  the  left  component  and  right  component 
are  the  same  individuals.  Because  some  tuples  of  this  kind 
may  already  be  present  in  the  original  relation,  we  have  to 
prevent  ourselves  from  duplication  of  tuples  while  taking  the 
reflexive  closure.  As  a  result  of  this  we  can  not  copy  the 
original  incidence  vector  to  the  new  incidence  vector 
blindly,  as  was  done  in  the  second  ancestral  operation.  So  we 
had  to  define  the  algorithm  in  a  different  manner  and  this 
increased  the  constants  in  front  of  the  second  and  third 
terms  of  the  above  complexity  function. 

The  algorithm  for  the  table  representation  is  similar  to 
the  algorithm  for  the  second  ancestral  operation,  except  we 
have  to  add  those  tuples  in  which  the  left  component  and  the 
right  component  are  the  same,  if  those  tuples  are  not  already 
present  in  the  relation  resulting  from  the  second  ancestral 
operation.  So  the  algorithm  should  detect  if  tuples  of  this 
kind  are  present  in  the  transitive  closure  of  the  original 
relation.  If  they  are  not  it  should  add  those  tuples  to  the 


resulting  relation.  Thus  the  algorithm  becomes  more 
expensive  than  the  algorithm  Cor  the  second  ancestral 
operation^  so  there  is  no  need  to  define  that  extremely 
expensive  algorithm  here. 

Final  Members  (final :R)  : 

This  operation#  given  a  relation/  restricts  the  LEM  set 
of  the  relation  to  the  set: 

{ x 1  not  (x«  (init:R))}  where  R  is  the  identifier  of  the 
given  relation.  So  in  the  relational  language  we  can  write 
the  equivalent  expression  as: 

(-init:R) /R 

So  the  result  of  this  operation  is  a  relation  which  has  those 
left  individuals  that  are  not  the  initial  members  of  the 
given  relation  as  its  LEM  set  members. 

The  algorithm  for  the  Hash-Incidence-Vector 
representation  is  given  below: 

1.  Get  the  relation's  identifier. 

2.  Hash  with  that  relation  identifier  to  the  relation 
table  and  find  the  record  of  the  relation. 

3.  Follow  the  pointer  found  in  the  PFLM  field  of  the 
relation's  record  and  find  the  record  of  the  first  left 
member  of  the  relation. 

4.  Hash  to  the  RHT  with  that  individual  under  the 
relation's  identifier.  If  there  exists  a  record  for  that 
individual  in  the  RIM  set/  make  a  separate  copy  of  the  LEM 
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record  (which  is  found  in  step  3)  of  that  individual  in  the 
LHT,  under  the  relation  identifier  "final :R",  where  R  is  the 
identifier  of  the  original  relation. 

5.  if  there  is  no  record  present  for  that  individual  in 
the  RIM  set#  do  nothing. 

6.  Proceed  in  the  LEM  set  of  the  relation  by  following 
the  TASB  links  between  the  LEM  (LHT)  records.  For  each 
record  found,  repeat  step  4  for  the  individual  represented  by 
that  record  until  the  LEM  set  of  the  relation  is  exhausted. 
While  creating  the  new  relation's  records  in  the  LHT  link 
then  to  each  other  as  they  are  created  (i.e.,  construct  LEM 
set  of  the  new  relation  in  the  LHT).  While  the  above  steps 
are  being  performed  keep  a  count,  and  increment  it  for  each 
new  LEM  record  created. 

7.  Make  separate  copies  of  the  RIM  set  records  of  the 
original  relation  in  the  RHT  under  the  new  relation 
identifier  by  first  following  the  PFRM  pointer  from  the 
original  relation's  record  in  the  relation  table  and  then 
following  the  TASE  links  between  the  records.  For  each  RHT 
record  found  hash  into  the  RHT  with  the  individual 
represented  by  this  record  under  the  new  relation's 
identifier  and  establish  a  record  in  the  RHT.  Copy  all  the 
fields  (except  the  TASE  field)  of  the  original  RIM  record  to 
the  new  record's  corresponding  fields,  and  link  the  records 
created  in  the  manner  explained  above,  to  each  other  as  they 


arc  craatad  (i.e.,  raconstcuct  tha  RIM  sat  o£  the  original 
ralation  as  tha  RIM  sat  of  tha  new  ralation) . 

8.  Establish  tha  naw  relation's  record  in  the  relation 
table  under  tha  naw  relation's  identifier.  Establish  the 
pointers  to  the  naw  LEM  sat  and  tha  naw  RIM  sat  into  tha  PFLM 
and  PERM  fields  of  that  record  respectively.  Copy  tha  "base" 
field  of  tha  original  relation's  record  into  that  record's 
"base"  field.  (Both  relations  share  the  saae  incidence 
vector).  Copy  the  ”|RIM|a  field  of  the  original  relation's 
record  into  the  "|RIM|"  field  of  the  new  relation's  record. 
Put  the  final  value  of  tha  "count*  into  tha  "|LEm|"  field  of 
that  record. 

Mote  that#  as  wa  have  dona  in  the  restriction  operations, 
we  aake  the  new  relation  share  the  incidence  vector  of  the 
original  relation.  Also,  because  we  copied  the  records  as 
they  were  (except  tha  TASE  fields),  the  contents  of  the  index 
fields  did  not  change,  so  tha  naw  ralation  can  share  the 
incidence  vector  with  tha  original  ralation.  The  costly 
steps  of  that  algorithm  are  steps  6  and  7.  In  step  6  we  went 
through  the  LEM  sat  of  tha  original  ralation  exhaustively  and 
in  step  7  wa  copied  tha  RIM  sat  of  tha  original  relation 
Tha  worst  case  complexity  function  of  that  algorithm  is  given 
below s 


f  ■  K*m  +  L*n  ♦  C 


where : 

K  *  Constant  number  of  memory  references  made  for  each 
LBM  element  found  in  step  S. 

L  *  Constant  number  of  memory  references  made  for  each 
RXM  set  element  in  step  7. 

C  ■  Constant  number  of  memory  references  made  in  steps  1, 
2,  3  and  8. 

m  ■  Cardinality  of  the  LEM  set  (original  relation) . 

n  ■  Cardinality  of  the  RIM  set  (original  relation) . 

Let  m  •  n  and  T  *  K  ♦  L,  then  the  function  becomes: 

f  «  T*n  +  C 

So  we  conclude  that  the  algorithm  has  the  worst  case 
asymptotical  time  complexity  behaviour  of  0(n). 

Mow  we  have  to  think  how  the  same  operation  can  be 
performed  on  the  table  representation.  Because  there  exists 
the  possibility  of  the  repetition  of  the  individuals  in  the 
left  and  right  columnar  it  is  obvious  that  the  resulting 
algorithm  will  be  more  costly  than  our  previous  algorithm. 
Me  can  reduce  the  cost  of  the  algorithm  by  making  use  of  the 
SCHT  mechanism.  The  algorithm  is  given  below: 

1.  Perform  steps  1  and  2  of  the  algorithm  given  for  the 
table  representation  in  the  "Initial  members”  operation.  This 
time  use  SCHT  instead  of  SHT. 

2.  Start  from  the  beginning  of  the  left  column  and  for 
each  individual  found  by  following  the  link  fields  of  the 


exceeds  (and  finding  the  left  individual  of  aach  record 
pointed  by  the  pointer  in  the  "left"  field  of  that  record) 
hash  into  the  SCHT.  If  no  record  is  present  in  the  SCHT  for 
that  individual#  sake  a  separate  copy  of  the  table  record  in 
which  we  found  the  individual  in  question.  If  it  is  the 
first  table  record  created#  nark  it  with  a  pointer. 

3.  Perform  the  disconnection  operation  in  the  SCHT  as  it 
was  done  in  "initial  members”  operation  to  clean  up  the  SCHT 
for  the  subsequent  operations. 

As  we  can  see  we  go  through  the  original  relation  three 
times  exhaustively  in  steps  1  and  2.  in  the  worst  case  the 
cardinality  of  the  initial  members  set  constructed  in  the 
SCHT  may  be  equal  to  the  cardinality  of  the  LEM  set#  under 
this  consideration  the  worst  case  time  complexity  function  of 
this  algorithm  can  be  written  as: 

f  «  K*p  ♦  M*p  ♦  N*p  ♦  L*n  +  C 
let  (K  +  M  +  H)  ■  Zj  the  function  becomes: 
f  »  z*p  +  L*n  +  C 

where  Z  >*  3.  Now  we  have  to  explain  the  meanings  of  the 
constants  and  the  variables  shown  in  the  first  function  in 
order  to  make  the  function  clear. 

The  constants  K  and  M  are  the  same  constants  we  defined 
in  the  "Initial  members”  operation  case.  The  constant  L 
represents  the  constant  number  of  memory  references  for 
disconnecting  each  record  from  the  SCHT.  The  constant  N 
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represents  the  constant  number  of  memory  references  made  for 
each  left  individual  found  in  step  2  of  the  algorithm.  The 
constant  C  represents  the  constant  number  of  memory 
references  made  by  the  housekeeping  operations.  The 
variables  are  defined  as  follows: 
p  *  The  relation  size, 
n  *  Cardinality  of  the  LQl  set. 

As  can  be  seen  the  constants  tend  to  be  large  and  we  know 
that  in  the  worst  case  (when  the  relation  is  equal  to  the 
cartesian  product  of  its  LEM  and  RIM  set)  p  is  equal  to  the 
product  of  the  cardinalities  of  the  LEM  and  RIM  sets,  so 
let's  rewrite  the  complexity  function  under  this 
consideration.  Let  n  *  m  (where  m  is  the  cardinality  of  the 
RIM  set).  The  function  becomes: 
f  ■  Z  *  n*n  +  L*n  +  C 

So  we  conclude  that  the  algorithm  has  the  worst  case 
asymptotical  time  complexity  behaviour  of  0(n2). 

First  Member  (first :R)  : 

This  operation,  given  a  relation,  finds  the  unique 
initial  member  of  that  relation,  if  there  exists  one  and  only 
one  initial  member  of  that  relation.  Otherwise  the  operation 
is  undefined.  The  algorithm  for  the  Bash-Incidence-Vector 
representation  is  given  below: 


1.  Do  stops  1  through  4  of  the  "initial  members" 
operation's  algorithm  (for  the  Bash-Incidence-Vector 
representation) . 

2.  Proceed  down  in  the  LEM  set  of  the  relation.  For 
each  individual  of  the  LEM  set  hash  to  the  RHT  with  the 
individual  found.  If  there  is  no  record  for  that  individual 
in  the  RIM  set  of  the  relation  then,  record  that  individual 
and  set  a  flag  to  true.  Find  the  next  individual's  record  in 
the  LEM  set  of  the  relation,  by  following  the  TASE  link  field 
of  the  current  record  of  LEM  set.  Continue  to  perform  the 
same  operation  on  the  individual  being  represented  by  that 
record.  If  any  subsequent  operation  results  with  another 
initial  member  (which  can  be  detected  by  checking  the  flag) 
call  the  error  routine,  else  continue  to  check  until  the  LEM 
set  of  the  relation  is  exhausted. 

3.  Return  the  individual  recorded  in  step  2  if  no  error 
occurs. 

In  fact  the  above  algorithm  is  the  worst  case  algorithm 
that  must  be  executed  in  the  absence  of  the  initial  members 
set  of  the  relation.  So  the  first  steps  of  the  above 
algorithm  should  be: 

1.  Concatenate  character  string  "init:"  with  the 
relation  identifier. 

2.  Bash  with  the  resulting  identifier  to  the  set  table. 
If  a  record  is  present  under  that  identifier,  follow  the  PSS 


field  of  that  record,  and  find  the  first  record  of  the  set 
structure. 

3.  Test  if  there  exists  any  other  record  in  this  set;  if 
so  call  the  error  routine,  else  follow  the  PML  field  of  the 
set's  record,  find  the  individual  and  return  that  individual 
as  the  result  of  the  operation. 

4.  if  no  record  is  found  in  step  2  (in  the  set  table) 
continue  with  the  previous  algorithm. 

Since  we  are  concerned  with  only  the  parts  of  the 
algorithms  that  cause  the  worst  case  behaviour,  we  can  focus 
on  the  previous  algorithm  for  the  time  complexity  analysis. 

Now  we  will  do  the  worst  case  asymptotical  time 
complexity  analysis  of  this  algorithm. 

As  can  be  seen,  step  2  of  the  previous  algorithm 
constitutes  the  asymptotical  behaviour  of  that  algorithm. 
The  complexity  function  can  be  written  as: 

f  -  K*n  +  C 

where  constant  K  is  the  number  of  memory  references  made  for 
each  LEM  set  record  found  and  constant  C  is  the  number  of 
memory  references  made  in  step  1. 

So  we  conclude  that  the  worst  case  asymptotical  time 
complexity  behaviour  of  that  algorithm  is  0(n). 

Now  we  have  to  define  the  algorithm  for  the  table 
representation.  We  will  again  use  the  SCHT  mechanism  in 
order  to  make  the  algorithm  efficient.  (We  can  use  the  same 


argument  we  have  used  in  the  "initial  members"  operation 
about  how  inefficient  the  algorithm  would  be  otherwise).  The 
steps  of  the  algorithm  are  given  below: 

1.  Find  the  first  record  of  the  table. 

2.  Proceed  in  the  table  by  following  the  link  fields 
between  the  records.  Extract  the  right  individuals 
represented  by  the  "right"  field  of  each  record,  hash  with 
each  right  individual  into  the  SCBT,  and  establish  its  SCHT 
record  in  the  SCHT.  Link  the  SCHT  records  to  each  other  by 
their  TASE  link  fields  as  they  are  created. 

3.  After  the  right  column  is  exhausted  begin  from  the 
beginning  of  the  relation  again.  Proceed  down  in  the  left 
column.  For  each  left  individual  found  in  the  same  manner  as 
it  was  done  for  the  right  individuals,  hash  into  the  SCHT  to 
check  if  a  record  is  already  present  for  that  individual  in 
the  SCHT  (i.e.,  that  is  effectively  looking  up  the  right 
column  in  an  efficient  way  because  we  established  the 
individuals  of  the  right  column  in  the  SCHT  in  step  2).  The 
first  time  a  record  is  found  in  the  SCHT  for  a  left 
individual  record  that  individual  and  set  a  flag  to  true. 

4.  if  in  any  of  the  subsequent  repetitions  of  step  3,  a 
record  is  found  in  the  SCHT  corresponding  to  a  left  indivi¬ 
dual  (i.e.,  while  flag  is  true)  call  the  error  routine.  If 
the  above  situation  does  not  occur  until  after  the  relation 
is  exhausted  (i.e.,  effectively  until  after  the  left  column 
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is  exhausted)  return  the  recorded  individual.  I£  there  does 
not  exist  any  initial  members  call  the  error  routine. 

5.  Disconnect  all  the  records  £rom  the  SCHT  (dispose 
them) . 

Clearly  steps  2  and  3  of  this  algorithm  require  one  to  go 
through  the  relation  once  so  the  complexity  function  becomes: 
f  -  2*K*p  +  L*n  +  C 

where  Constant  K  is  the  number  of  memory  references  made  for 
each  element  of  the  right  column  and  left  column  of  the  table 
in  step  2  and  3,  Constant  L  is  the  number  of  memory 
references  made  for  each  element  of  the  RIM  set/  and  constant 
C  is  the  number  of  memory  references  made  by  the  housekeeping 
operations.  Variable  ”p"  is  the  size  of  the  relation,  in 
other  words  the  number  of  tuples  in  the  relation.  Variable 
”n"  is  the  cardinality  of  the  RIM  set  of  the  relation. 

Me  know  that  in  the  worst  case  the  relation  may  be  equal 
to  the  cartesian  product  of  its  LEM  set  and  RIM  set  and, 
p  ■  n  *  n 

where  n  is  assumed  to  be  the  cardinality  of  both  the  LEM  and 
the  RIM  set  of  the  relation.  Let  T  ■  (2k  +  L) ,  so  under  the 
above  consideration  the  worst  case  complexity  function 
becomes : 

f  *  T*n*n  +  C 

So  we  conclude  that  the  algorithm  has  the  worst  case 
asymptotical  time  complexity  behaviour  of  0(n2). 
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APPENDIX  B 


THE  COMPLEXITY  FUNCTIONS  WITH  THE  PREDICTED  CONSTANTS 


HASH  INCIDENCE  VECTOR 

TABLE 

Function 

Application 

IS 

5n2 

R1  :x 

1.4n?  +  36n  +  28 

12n3  4-  15 

Rc 

3n2  +  36n  +  35 

4n2  +  32 

uning:R:x 

19n  +  37 

16n2  +  14 

non:R 

0.02n2  +  64n  +  8 

6n3  4-  50n2  4-  3n  4-  16 

R&S 

4 In  2  +  51n  4-  42 

7n4  4-  4n2  4-  1 

R-S 

87n2  +  42 

3n*  4-  4n2  4-  18 

r|s 

42n2  +  20n  4-  52 

14n3  +  30n2  +  18 

RVS  (set) 

28n  4-  36 

RAS  (set) 

28n  ♦  36 

R-S  (set) 

28n  +  36 

initsR 

23n  4-  27 

36n2  4-  16 

f inalsR 

46n  +  30 

50n2  4-  12n  4-  18 

f irst ;R 

12n  4*  13 

28n2  4-  12n  4-  11 

lem :  R 

18n  4-  25 

17n2  4-  12n  4-  14 

mem:R 

64n  4-  86 

33n2  +  15 

R\S 

64n  4-  43 

36n2  4-  18 

C/R\C 

48n  4-  24 

20n2  +  18 

#:C 

2n  +  14 

RS 

0.9n3  4-  0.15n2  4-  38 

14n4  4-  8n2  4-  18 

APPENDIX  B 
(Continued) 


HASH  INCIDENCE  VECTOR  TABLE 

r||S  152n*  +  0.15n2  +  18  98n2  +  18 

R#S  0.15n2  +  134n  +  39  64n2  +  18 

fan:R  0.15n3  +  42n2  +  66n  +  30 

san:R  0.15n3  +  0.8n2  +  34n  +  30  30n4  +  61n2  +  18 


APPENDIX  C 


PREPROCESSING  RULES 


(R  &  S)c  *■■■■>  Rc  &  Sc 

(R  |  S)c  **«■>  rc  |  sc 

(R  -  S)c  »■»■■>  rc  -  sc 

(sansR)c  ■**■■>  san:(Rc) 

(fan:R)c  ■■■««>  fan: (Rc) 

(RS) c  »»«■»>  (Sc)  (Rc) 

lem: (Rc)  ■■■■«>  rim:  Rc 

rim:(Rc)  »■««■>  lem:  r 

men: (Rc)  »»«■»>  mem:  R 

unimg:(Rc)  ■■■*»>  unimg':R 

Rc\C  *■■*•>  c/R 

C/RC  mmmm m>  R\C 

init: (Rc)  »»■■»>  rim:R  and  not  (lem:R) 

init:(R)  ■■■■*>  lero:R  and  not  (tim:R) 

final:  (Rc)  «■«■>  r  \  (not(lem:R)) 

final:R  ■■■■■>  (lem:r  and  not  (rim:R))  /  R 

(R  &  S)I:C  ■■*-«>  rjjC  &  Sl:C 

(R  |  S)  1  :C  —  ■— >  R1:C  |  Si:C 

(non:R)I:C  »■■■*>  lem:R  and  not  (Rl :C) 

lem: (R6S)  ■*■■■>  lem:R  and  lem:S 

lem:(R|S)  ■■■■■>  lem:R  or  lem:S 


rim: (R6S) 


■■■■■> 


rim:R  and  rim:S 


unimg (R&S) 
unimg (R | S) 
unimg  (R-S) 
lem:  (fan:R) 


!*■■> 


unimg :R  and  unimg :S 
unimg :R  or  unimg :S 
unimg :R  -  unirag :S 


mem:R 


lem:  (san:R) 
rim: (fan:R) 
rim: (san:R) 
unimg  (R|  |  S) 
unimg  (R#S) 
unimg ' (R | | S) 
unimg' (R#S) 
rim: (R\C) 
rim:  (C/R) 
lem: (R\C) 
lem:  (C/R) 


aaasa> 


«aaa> 


lem:R 


mem:R 


rim:R 

(R| | S)  : 

(R#S)  : 

(R|  |  S)c: 
(R#S) c  : 
rim:R  and  C 
rim:R 
lem:R 

lem:R  and  C 


mem:R 


lem:R  or  rim:R 


lem: (£inal:R) 


init :R 


(non:R)c 
rim:  (£inal:R) 


non: (Rc) 
rim:R 


mem: (final :R) 


rim  :R  or  (lem:R  and  not 


init: (£inal:R) 


(rim:R) ) 
init :R 


R1 :  (rim:R) 


lem  :R 


lem: (non:R) 


(non:R) 1 : (rim:R) 


(46)  s 

(47) : 

(48) : 


ria:  (non:R)  (non:Rc)  l :  (lea:R) 

(uniag: (non:R) ) :x  ■->  (lea:R  -  (uniag:R:x)) 
(unlag 1 : (non: R) ) :x  »>  (ria:R  -  (unlag ' :R:x) ) 


it* 


APPENDIX  D 


MEMBERSHIP  TEST  ALGORITHMS  CONTINUED 
R1 :C  (R  is  extensionally  represented  relation)  : 

Pr  (uning'  :R:z) - set - >  Cl 

Fr(C) - Test-each-in - >  Cl any — >true{z  is  in  the  set.} 

(non:R) 1 :C  : 

Fr(ria:R) - - set 

Fr (uniag* :R:z)  - - set 

while  C2*y  C1-C2'  —set 
Fr  (C) - test-each- in— 

(non:R)cl:C  : 

Fr(lea:R) - - — set- — >  Cl 

Fr  (uniag  :R:z) - - - set — ->  C2 ' 

while  C2* ;  C1-C2' - set - >  D 

Fr(C)  —test-each-in — ->  D  — -any — >  true{z  is  in  the  set} 


- >  Cl 

— >  C2 ' 

- >  D 

- >  D  —  any— >  true{z  is  in  the  set} 


uning' : (R |  |  S)  :x 


Pr (uning :R: left (z) ) - set— >  Cl 

Pr (uning :S: right  (z))  —set - >  C2 

left(x)  — is— in— >  Cl  — tx - >  true — tx — >varA 

right  (x) — is— in— >  C2  — tx - >  true — tx — >VarB 

varA  and  varB  — — tx—  >true{z  is  in  the  set} 

uning' : (R#S) :x  : 

Pr  (uning' iRsleft (x) ) 

Pr (uning* :S: right (x) ) 

Cl  and  C2  - 

z - is — in — >D  — -tx 

uning' : (RS) :x  : 

Pr  (uning' :Rsx) - — 

Pr  (uning  :S:z)  - 

while  C2' /  Cl  and  C2' 
while  O';  isenpty(D') 


—set— — >  Cl 

- set - >  C2 ' 

- tx - >  D' 

— -tx— >  false{z  is  in  the  set} 


- set- - >  Cl 

— — set- - >  C2 

- set - >  d 

— >  true{z  is  in  the  set} 


Pc  (uning '  iR:  z) - -set - >  cl 

Pc  (an lag ' :S:z) - set - >  C21 

while  C2 ' ;  Cl  -  C2* - tx - >  D' 

while  D'j  iseapty(D')  -tx— — >  false{z  is  in  the  set} 

lesu(non:R)  : 

Pr(ria:R) - set - >  Cl 

Pr  (uning '  :R:z) - -set - >  C2 ' 

while  C2'|  Cl  -  C2'-set - >  O* 

w..ile  D';  isenpty (O' ) tx - >  false{z  is  in  the  set} 

lea:(R||S)  : 

left  (a) - is — in - >  len:R  —  tx - >  true  — tx - >varA 

right  (s)  - - is — in - >  lem:S  — tx - >  true  —  tx - >varB 


war A  and  varB  — tx - >  true{s  is  in  the  set} 

lem:(R#S)  : 

Pr  (uning '  :R:z)  - set - >  cl 

Pr  (uniag '  :S:z)  - set- - >  C2' 


while  C2 ' ;  cl  and  C2' tx - >  D' 

while  D':  iseapty (O') tx - >  falseU  i»  in  the  set} 


lea: (RS) 


Fr(lea:S)  - set - >  Cl 

Fr  (uniag':S:z) -test-each-in— ->C1— -any — >true{z  is  in  the  set} 

ria: (R-S)  : 

Fr  (uniag  :R:  z) - - 

Fc  (uning  :S:z)  - 

while  C2' ;  Cl  -  C2* 
while  D'j  iseapty(D') 

ria:  (non:R)  : 

HP 

Fr  ( lea :  R)  - — 

Fr  (uning  :R:z)  - 

while  C2'»  Cl  -  C2'  - 
while  D'»  iseapty(D') 

ria:(R||3)  : 

left  (z) - is-in - >  ria:R - tx — >  true  — tx — >  varA 

right (z)  — is-in - >  ria:S - tx — >  true - tx — >  varB 

varA  and  varB - tx - >  true{z  is  in  the  set} 


rins  (R#S) 


z - is-in - >  rimsR  — -tx — >  true tx — >  varA 

z  — -is-in- — >  riasS - tx— >  true  — -tx — >  varB 

varA  and  varB  - tx - >  true{z  is  in  the  set} 

ria:  (RS)  : 
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